(1) Atomic Memory Operation (AMO)
This patch changes how RISC-V AMO instructions are implemented. For each
AMO, instead of issuing a locking load and an unlocking store request to
downstream memory system, this patch issues a single memory request that
contains a corresponding AtomicOpFunctor to the memory system. Once the
memory system receives the request, the atomic operation is executed in
one single step.
This patch also changes how AMO instructions handle acquire and release
flags in AMOs (e.g., amoadd.aq and amoadd.rl). If an AMO is associated
with an acquire flag, a memory fence is inserted after the AMO completes
as a micro-op. If an AMO is associated with a release flag, another
memory fence is inserted before the AMO executes. If both flags are
specified, the AMO is broken down into a sequence of 3 micro-ops:
mem fence -> atomic RMW -> mem fence. This change makes this AMO
implementation comply to the release consistency model.
(2) Load-Reserved (LR) and Store-Conditional (SC)
Addresses locked by LR instructions are tracked in a stack data
structure. LR instruction pushes its target address to the stack, and SC
instruction pops the top address from the stack. As specified by RISC-V
ISA, a SC fails if its target address does not match with the most recent
LR.
Previously, there was a single stack for all hardware thread contexts.
A shared stack between thread contexts can lead to a infinite sequence
of failed SCs if LRs from other threads keep pushing new addresses to
this stack.
This patch gives each context its private stack to address the problem.
This patch also adds extra memory fence micro-ops to lr/sc to guarantee
a correct execution order of memory instructions with respect to release
consistency model.
Change-Id: I1e95900367c89dd866ba872a5203f63359ac51ae
Reviewed-on: https://gem5-review.googlesource.com/c/8189
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Alec Roelke <ar4jc@virginia.edu>
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.
Atomic memory instruction is treated as a special store instruction in
all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch implements FUTEX_WAKE_OP operation in the futex syscall.
Below is its description:
int futex(int *uaddr, int futex_op, int val,
const struct timespec *timeout,
int *uaddr2, int val3);
This operation was added to support some user-space use cases where more
than one futex must be handled at the same time. The most notable
example is the implementation of pthread_cond_signal(3), which requires
operations on two futexes, the one used to implement the mutex and the
one used in the implementation of the wait queue associated with the
condition variable. FUTEX_WAKE_OP allows such cases to be implemented
without leading to high rates of contention and context switching.
Reference: http://man7.org/linux/man-pages/man2/futex.2.html
Change-Id: I215f3c2a7bdc6374e5dfe06ee721c76933a10f2d
Reviewed-on: https://gem5-review.googlesource.com/c/9630
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch supports FUTEX_CMP_REQUEUE operation. Below is its
description from Linux man page:
futex syscall: int futex(int *uaddr, int futex_op, int val,
const struct timespec *timeout,
int *uaddr2, int val3);
This operation first checks whether the location uaddr still contains
the value val3. If not, the operation fails with the error EAGAIN.
Otherwise, the operation wakes up a maximum of val waiters that are
waiting on the futex at uaddr. If there are more than val waiters, then
the remaining waiters are removed from the wait queue of the source
futex at uaddr and added to the wait queue of the target futex at
uaddr2. The val2 argument specifies an upper limit on the number of
waiters that are requeued to the futex at uaddr2.
Reference: http://man7.org/linux/man-pages/man2/futex.2.html
Change-Id: I6d2ebd19a935b656d19d8342f7ab450c0d2031f4
Reviewed-on: https://gem5-review.googlesource.com/c/9629
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
In case of failure, a syscall returns a negative value encoding the
error code. This patch makes the risc-v implementation returns the
encoded value instead of its absolute value upon a failure of a syscall.
Change-Id: I6032b0337fe1cff5b326dbc6bb3b87a415f03300
Reviewed-on: https://gem5-review.googlesource.com/c/9627
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Alec Roelke <ar4jc@virginia.edu>
When a thread is suspended, all instructions after the suspension need
to be discarded since the thread will take a different execution stream
when it wakes up.
To do that, in MinorCPU, whenever a thread gets suspended, we change the
current execution stream by updating the current branch with
BranchData::SuspendThread reason.
Change-Id: I7cdcda22c1cf6e8ac8db8800b7d9ec052433fdf3
Reviewed-on: https://gem5-review.googlesource.com/c/9626
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@gmail.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
When a thread calls exit_group, in addition to halting the thread
itself, it needs to halt all other threads in its group (i.e., threads
sharing the same thread group ID). This patch enables threads to do
that.
Change-Id: Ib2e158fb27cf98843f177a64a2d643b1bbc94d03
Reviewed-on: https://gem5-review.googlesource.com/c/9623
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch adds support for two operations in futex system call:
FUTEX_WAIT_BITSET and FUTEX_WAKE_BITSET. The two operations are used to
selectively wake up a certain thread waiting on a futex variable.
Basically each thread waiting on a futex variable is associated with a
bitset that is checked when another thread tries to wake up all threads
waiting on the futex variable.
Change-Id: I2300e53b144d8fae226423fa2efb0238c1d93ef9
Reviewed-on: https://gem5-review.googlesource.com/c/9621
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
When a thread executed an exit syscall in SE mode, the thread context
was removed immediately in the same cycle, which left inflight squash
operations and trap event incomplete. The problem happened when a new
thread was assigned to the CPU later. The new thread started with some
incomplete transactions of the previous thread (e.g., squashing). This
problem could cause incorrect execution flow for the new thread (i.e.,
pc was not reset properly at the exit point), deadlock (i.e., some
stage-to-stage signals were not reset) and incorrect rename map between
logical and physical registers.
This patch adds a new state called 'Halting' to the thread context and
defers removing thread context from a CPU until a trap event initiated
by an exit syscall execution is processed. This patch also makes sure
that the removal of a thread context happens after all inflight
transactions of the to-be-removed thread in the pipeline complete.
Change-Id: If7ef1462fb8864e22b45371ee7ae67e2a5ad38b8
Reviewed-on: https://gem5-review.googlesource.com/c/8184
Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Checking if cpsr.mode is equal to MODE_HYP doesn't work for AArch64.
This is because AArch64 is using different modes when in EL2, like EL2T
and EL2H.
This made Virtual Interrupts to be triggered even when executing in EL2
(hypervisor) whereas they should interrupt the scheduled VM only
(Non-Secure EL0 and EL1). This patch is fixing this by using the generic
currEL() helper for getting the exception level, which is working for
both AArch32 and AArch64.
Change-Id: I08640050ef06261f280ba1e63ca9f32c805af845
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/16202
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Differently from ArmSPIs, ArmPPI interrupts need to be instantiated by
giving a ThreadContext pointer in the ArmPPIGen::get() method. Since the
PMU is registering the ThreadContext only at ISA startup time, ArmPPI
generation in deferred until the PMU has a non NULL pointer.
Change-Id: I17daa6f0e355363b8778d707b440cab9f75aaea2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/16204
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
A version of Linux kernel initializes counters before enabling them.
Without this change, gem5 overwrites the value of counter, which causes
incorrect counter values derived by kernel.
Change-Id: If0c515111103018d5f65f74434d7711a67aeaee4
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/16203
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
In every arm platform which is making use of them, mem_regions are
interpreted as a pair of start address and size. However arm
SimpleSystem, which is using VExpress_GEM5_V1, is interpreting them as
start address and end address. This patch is fixing this mismatch.
Change-Id: I0b2a2193cd07fbc5430f233438269a9c7c353df9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Ciro Santilli <ciro.santilli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/16205
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Fence instruction origially had two flags NonSpeculative and
MemBarrier. In O3 model, MemBarrier instructions are inserted
into the instruction queue by the InstructionQueue::insertBarrier (at
src/cpu/o3/iew_impl.hh:1083). Barrier instructions are implicitly
assumed to be non-speculative.
Adding NonSpeculative flag to fence instruction makes it inserted into
the instruction queue twice (at src/cpu/o3/iew_impl.hh:1083 and :1111).
This can lead to a deadlock if both pointers to the instruction are not
cleared from the queue when the instruction retires.
This patch removes NonSpeculative flag from the fence inst.
Change-Id: I26573d12a0b52f43b73c0e51158286dc98d05ea4
Reviewed-on: https://gem5-review.googlesource.com/c/8183
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Alec Roelke <ar4jc@virginia.edu>
When a thread is activated by another thread calling a clone system
call, the child thread's context is initialized in the middle of the
clone system call and before the context is fully initialized.
Therefore, the child thread starts fetching an unitialized PC, which
could lead to a page fault.
This patch adds a pipeline wakeup event that is scheduled later in the
cycle when the thread is activated. This event ensures that the first
fetch only happens after the thread context is fully initialized
(e.g., in case of clone syscall, it is when the parent thread copies
its context over to the child thread).
When a thread first starts or wakes up, input queue to the Fetch2 stage
needs to be drained since the execution flow is likely to change and
previously fetched instructions in the queue may no longer be in the
correct flow. This patch dumps/drains all inputs in the input queue
of a thread context in the Fetch2 stage when the associated thread wakes
up.
Change-Id: Iad970638e435858b7289cd471158cc0afdbbb0e5
Reviewed-on: https://gem5-review.googlesource.com/c/8182
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
Since f2bda876f7, the build system started
adding a length for generated blobs as in:
const std::size_t variable_len = 123;
There were two types of blob files, ones with a header and the ones
without.
The ones with the header, also include the header in the .cc of the blob,
which contains a declaration:
extern const std::size_t variable_len;
Therefore, the ones without header, don't have that extern declaration,
which makes them static according to the C++ standard.
clang then correctly interprets that as problematic due to
-Wunused-const-variable, while GCC does not notice this.
This patch removes the length declaration from the blob files that don't
have the header. Those files currently don't use the length.
Change-Id: I3fc61b28f887fc1015288857328ead2f3b34c6e6
Reviewed-on: https://gem5-review.googlesource.com/c/15955
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
A recent patch add the use of the macro:
CMSG_ALIGN
This macro is not very cross-platform, and needs to be
defined according to the platform.
This patch defines the missing macro on MacOS.
Change-Id: I582f69e652dc060b4532358141179ad6d37eafc7
Reviewed-on: https://gem5-review.googlesource.com/c/16102
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
This implementation is based in the description available in:
Jinchun Kim, Seth H. Pugsley, Paul V. Gratz, A. L. Narasimha Reddy,
Chris Wilkerson, and Zeshan Chishti. 2016.
Path confidence based lookahead prefetching.
In The 49th Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO-49). IEEE Press, Piscataway, NJ, USA, Article 60, 12 pages.
Change-Id: I4b8b54efef48ced7044bd535de9a69bca68d47d9
Reviewed-on: https://gem5-review.googlesource.com/c/14819
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Python 2.7 requires a workaround when wrapping exit objects to
explicitly convert the return of getCode() to int to not confuse
sys.exit. This workaround isn't needed and doesn't work on Python 3
since it doesn't have a separate long integer type.
Change-Id: I57bc3fd8f4699676c046ece8a52baa2796959ffd
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15978
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Some tests are really just a wrapper around a test script in
configs/. Add a helper method to wrap these scripts to make sure they
are executed in a consistent environment. This wrapper sets up a
global environment that is identical to that created by main() when it
executes the script. Unlike the old wrappers, it updates the module
search path to make relative imports work correctly in Python 3.
Change-Id: Ie9f81ec4e2689aa8cf5ecb9fc8025d3534b5c9ca
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15976
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Most Ruby tests assume that the highest frequency in the system under
test is 1GHz and limits the global tick rate to this frequency. This
assumption is broken since the default Ruby configuration scripts
clock the CPU at 2Ghz, which results in warnings and sometimes
incorrect behaviour.
Change-Id: I4b204660862ce3b0ea4a13df42caacd4398fef8c
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15975
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Latest-gen. vector/SIMD extensions, including the Arm Scalable Vector
Extension (SVE), introduce the notion of a predicate register file.
This changeset adds this feature across architectures and CPU models.
Change-Id: Iebcadbad89c0a582ff8b1b70de353305db603946
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13715
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
At Ia49298304f658701ea0800bd79e08db404a655c3 we removed the default
kernel and DTB filenames from FSConfig.py.
However, the regression tests rely on that to find those blobs.
This commit restores those default filenames just for the config of the
regression tests.
Change-Id: I9d7d869b0087ee8a3b63088693f753a703ead5d6
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15957
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Before this commit, there were default magic DTB and kernel filenames
for some platforms.
This was inelegant and error prone, as it refered to out-of-tree files,
and set defaults which users almost always want to customize with
explicit command line options.
One result of this is that a wrong exception could be thrown if --kernel
was given but not --machine-type, since the default machine type
VExpress_EMM had a default kernel, and the code would always search for
the default filename even though --kernel was given:
IOError: Can't find file 'vmlinux.aarch32.ll_20131205.0-gem5' on path.
The defaults existed only for older machine types, and not for the
usually recommended VExpress_GEM5_V1, which suggests that this
deprecation should not affect many users.
Change-Id: Ia49298304f658701ea0800bd79e08db404a655c3
Reviewed-on: https://gem5-review.googlesource.com/c/15898
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>