Turn the functions within it into virtual methods on the ISA classes.
Eliminate the implementation in MIPS, which was just copy pasted from
Alpha long ago. Fix some minor style issues in ARM. Remove templating.
Switch from using an "XC" type parameter to using the ThreadContext *
installed in all ISA classes.
The ARM version of these functions actually depend on the ExecContext
delaying writes to MiscRegs to work correctly. More insiduously than
that, they also depend on the conicidental ThreadContext like
availability of certain functions like contextId and getCpuPtr which
come from the class which happened to implement the type passed into XC.
To accomodate that, those functions need both a real ThreadContext, and
another object which is either an ExecContext or a ThreadContext
depending on how the method is called.
Jira Issue: https://gem5.atlassian.net/browse/GEM5-1053
Change-Id: I68f95f7283f831776ba76bc5481bfffd18211bc4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50087
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Instruction fetch should not commence if there already is an
instruction-count event in the queue.
The most conspicuous scenario where this leads to obvious breakage,
is guest debugging. Imagine the first bytes in the program pointed to
by _start are invalid instruction encoding, and we pass the --wait-gdb
flag. Then in GDB we set $pc to point to valid instructions, and we
"continue". gem5 will abort with "invalid instruction".
This is not how real targets behave: neither software- (e.g. ptrace)
based debuggers, nor low-level (e.g. OpenOCD or XMD connected over
JTAG to debug early initialization code eg when the MMU has not been
switched on yet, etc.) Fetching should start from where $pc was set
to. This patch tries to model this behavior.
Change-Id: Ibce6fdbbb082edf1073ae96745bc7867878f99ca
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27587
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Apply the gem5 namespace to the codebase.
Some anonymous namespaces could theoretically be removed,
but since this change's main goal was to keep conflicts
at a minimum, it was decided not to modify much the
general shape of the files.
A few missing comments of the form "// namespace X" that
occurred before the newly added "} // namespace gem5"
have been added for consistency.
std out should not be included in the gem5 namespace, so
they weren't.
ProtoMessage has not been included in the gem5 namespace,
since I'm not familiar with how proto works.
Regarding the SystemC files, although they belong to gem5,
they actually perform integration between gem5 and SystemC;
therefore, it deserved its own separate namespace.
Files that are automatically generated have been included
in the gem5 namespace.
The .isa files currently are limited to a single namespace.
This limitation should be later removed to make it easier
to accomodate a better API.
Regarding the files in util, gem5:: was prepended where
suitable. Notice that this patch was tested as much as
possible given that most of these were already not
previously compiling.
Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In this context, the decoder width is the number of bytes that are fed
into the decoder at once. This is frequently the same as the size of an
instruction, but in instructions with occasionally variable instruction
sizes (ARM, RISCV), or extremely variable instruction sizes (x86) there
may be no relation.
Rather than determining the amount of data to feed to the decoder based
on a MachInst type defined by each ISA, this new interface adds some new
properties to the base InstDecoder class each arch specific decoder
inherits from. These are the size of the incoming buffer, a pointer to
wherever that data should end up, and a mask for masking a PC value so
it aligns with the instruction size.
These values are filled in by a templated InstDecoder constructor which
is templated based on what would have historically been the MachInst
type.
Because the "moreBytes" method would historically accept a parameter of
type MachInst, this parameter has also been eliminated. Now, the
decoder's parent object should use the pointer and size values to fill
in the buffer moreBytes reads. Then when moreBytes is called, it just
uses the buffer without having to show what its type is externally.
Change-Id: I0642cdb6a61e152441ca4ce47d748639175cda90
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40175
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This was mostly not used to begin with, but also when it was used, it
would obscure places where there were types, functions, etc, which were
switched between ISAs at compile time, and which would need to be
cleaned up to allow more than one ISA at a time.
Change-Id: Ieb372feff91b7e946b477fb78e54bcd0c2138966
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39655
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
If the memory system can provide a back door to memory, store that, and
use it for subsequent accesses to the range it covers. For now, this
covers only fetch. That's because fetch will generally happen more than
loads and stores, and because it's relatively simple to implement since
we can ignore atomic operations, etc.
Some limitted benchmarking suggests that this speeds up x86 linux boot
by about 20%, although my modifications to the config to remove caching
(which blocks the back door mechanism) also made gem5 crash, so it's
hard to say for sure if that's a valid result. The crash happened in the
same way before and after, so it's probably at least relatively
representative.
While this gives a pretty substantial performance boost, it will prevent
statistics from being collected at the memory, or on intermediate objects
in the interconnect like the bus. That is to be expected with this
memory mode, however.
Change-Id: I73f73017e454300fd4d61f58462eb4ec719b8d85
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36979
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:
const Params &
Params &
Params *
const Params *
Params const*
This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).
Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The byteEnable variable is used for masking bytes in a memory request.
The default behaviour is to provide from the ExecContext to the CPU
(and then to the LSQ) an empty vector, which is the same as providing
a vector where every element is true.
Such vectors basically mean: do not mask any byte in the memory request.
This behaviour adds more complexity to the downstream LSQs, which now
have to distinguish between an empty and non-empty byteEnable.
This patch is simplifying things by transforming an empty vector into
a all true one, making sure the CPUs are always receiving a non empty
byteEnable.
JIRA: https://gem5.atlassian.net/browse/GEM5-196
Change-Id: I1d1cecd86ed64c53a314ed700f28810d76c195c3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23285
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch sets the faulting flag in atomic, timing, minor and o3 CPU
models.
It also fixes the minor/timing CPU models which were not respecting the
ExecFaulting flag. This is now checked before calling dump() on the
tracing object, to bring it in line with the other CPU models.
Change-Id: I9c7b64cc5605596eb7fcf25fdecaeac5c4b5e3d7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30135
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The new local access mechanism installs a callback in the request which
implements what the mmapped IPR was doing. That avoids having to have
stubs in ISAs that don't have mmapped IPRs, avoids having to encode
what to do to communicate from the TLB and the mmapped IPR functions,
and gets rid of another global ISA interface function and header files.
Jira Issue: https://gem5.atlassian.net/browse/GEM5-187
Change-Id: I772c2ae2ca3830a4486919ce9804560c0f2d596a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23188
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change is based on modify the way we move the AtomicOpFunctor*
through gem5 in order to mantain proper ownership of the object and
ensuring its destruction when it is no longer used.
Doing that we fix at the same time a memory leak in Request.hh
where we were assigning a new AtomicOpFunctor* without destroying the
previous one.
This change creates a new type AtomicOpFunctor_ptr as a
std::unique_ptr<AtomicOpFunctor> and move its ownership as needed. Except
for its only usage when AtomicOpFunc() is called.
Change-Id: Ic516f9d8217cb1ae1f0a19500e5da0336da9fd4f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20919
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
data_write_req byteEnable which is used in ARM SVE partial writes was not
being zeroed between writes.
As a result, non-SVE memory write instructions such as STP that followed
SVE memory write instructions could still have the write mask active.
This could lead to wrong simulation behaviour, and to an assertion failure:
src/mem/packet.hh:1211: void Packet::writeData(uint8_t*) const: Assertion
`req->getByteEnable().size() == getSize()' failed. '`
Change-Id: I74b5a82675e9923b0ffdf2c1dd9afb00c91cb204
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20448
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This changeset adds support for partial (or masked) loads/stores, i.e.
loads/stores that can disable accesses to individual bytes within the
target address range. In addition, this changeset extends the code to
crack memory accesses across most CPU models (TimingSimpleCPU still
TBD), so that arbitrarily wide memory accesses are supported. These
changes are required for supporting ISAs with wide vectors.
Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>
- Tiago Muck <tiago.muck@arm.com>
Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.
Atomic memory instruction is treated as a special store instruction in
all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
The AtomicSimpleCPU used to be able to access memory directly to speed
up simulation if no caches are used. This is fine as long as no
switching between CPU models is required. In order to switch to a new
CPU model that requires caches, we currently need to checkpoint the
system and restore it into a new configuration. The new
'atomic_noncaching' memory mode provides a solution that avoids this
issue since caches are bypassed in this mode. This changeset removes
the old fastmem option from the AtomicSimpleCPU and introduces a new
CPU, NonCachingSimpleCPU, which derives from the AtomicSimpleCPU.
The NonCachingSimpleCPU uses the same mechanism as the AtomicSimpleCPU
used to use when accessing memory in when fastmem was enabled.
This changeset also introduces a new switcheroo test that tests
switching between a NonCachingSimpleCPU and a TimingSimpleCPU with
caches.
Change-Id: If01893f9b37528b14f530c11ce6f53c097582c21
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/12419
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch is changing the underlying type for RequestPtr from Request*
to shared_ptr<Request>. Having memory requests being managed by smart
pointers will simplify the code; it will also prevent memory leakage and
dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10996
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Move the code responsible for performing the actual probe point notify
into BaseCPU. Use BaseCPU activateContext and suspendContext to keep
track of sleep cycles. Create a probe point (ppActiveCycles) that does
not count cycles where the processor was asleep. Rename ppCycles
to ppAllCycles to reflect its nature.
Change-Id: I1907ddd07d0ff9f2ef22cc9f61f5f46c630c9d66
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/5762
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
If the CPU has been clock gated for a sufficient amount of time
(configurable via pwrGatingLatency), the CPU will go into the OFF
power state. This does not model hardware, just behaviour.
Change-Id: Ib3681d1ffa6ad25eba60f47b4020325f63472d43
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/3969
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
This changeset adds functionality that allows system calls to retry without
affecting thread context state such as the program counter or register values
for the associated thread context (when system calls return with a retry
fault).
This functionality is needed to solve problems with blocking system calls
in multi-process or multi-threaded simulations where information is passed
between processes/threads. Blocking system calls can cause deadlock because
the simulator itself is single threaded. There is only a single thread
servicing the event queue which can cause deadlock if the thread hits a
blocking system call instruction.
To illustrate the problem, consider two processes using the producer/consumer
sharing model. The processes can use file descriptors and the read and write
calls to pass information to one another. If the consumer calls the blocking
read system call before the producer has produced anything, the call will
block the event queue (while executing the system call instruction) and
deadlock the simulation.
The solution implemented in this changeset is to recognize that the system
calls will block and then generate a special retry fault. The fault will
be sent back up through the function call chain until it is exposed to the
cpu model's pipeline where the fault becomes visible. The fault will trigger
the cpu model to replay the instruction at a future tick where the call has
a chance to succeed without actually going into a blocking state.
In subsequent patches, we recognize that a syscall will block by calling a
non-blocking poll (from inside the system call implementation) and checking
for events. When events show up during the poll, it signifies that the call
would not have blocked and the syscall is allowed to proceed (calling an
underlying host system call if necessary). If no events are returned from the
poll, we generate the fault and try the instruction for the thread context
at a distant tick. Note that retrying every tick is not efficient.
As an aside, the simulator has some multi-threading support for the event
queue, but it is not used by default and needs work. Even if the event queue
was completely multi-threaded, meaning that there is a hardware thread on
the host servicing a single simulator thread contexts with a 1:1 mapping
between them, it's still possible to run into deadlock due to the event queue
barriers on quantum boundaries. The solution of replaying at a later tick
is the simplest solution and solves the problem generally.
Add functionality to the BaseCPU that will put the entire CPU
into a low-power idle state whenever all threads in it are idle.
Change-Id: I984d1656eb0a4863c87ceacd773d2d10de5cfd2b
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.
The following patches had unexpected interactions with the current
upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
--HG--
extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.