The approach we currently use to register our native modules doesn't
work on Python 3. Convert the code to use the Python inittab instead
of the old ad-hoc method.
Change-Id: I961f8a33993c621473732faeaab955a882769a4b
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15979
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
The constructor assumes the number of nodes (i.e. controllers) equal to
the number of external nodes.
This is a not necessarily valid for all cases (e.g MESI_Three_Level -
where L0s are directly connected to L1s).
MachineType_base_number(MachineType_NUM) provides the total number of
controllers.
Signed-off-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Change-Id: Id906099dc967ec70aa34dedb0b55351031ff242c
Reviewed-on: https://gem5-review.googlesource.com/c/15716
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Adding back some changes done in patch 676ae57827.
Transient state IS_I, STALE_DATA, Data_Stale event are necessary.
Issue: (cacheline A, initial state for P0 and P1 is I)
| P0 | P1 |
|GETX (A)| |
| |GETS (A)|
|Inv_All | |
P1 never sends the ACK - deadlock
It should ACK, later upon data use it as stale data, and got to I.
Solution:
P1(A):
GETS: I->IS
Inv_All: IS->IS_I, Send ACK
Data: IS_I->I, STALE_DATA to L0
Signed-off-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Change-Id: I1e7b2c05439d08579c68d8eb444e0f332e75e07f
Reviewed-on: https://gem5-review.googlesource.com/c/15715
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
argv[0] is already part of sys.argv, so we don't need to add an
additional argument in front of sys.argv.
The argv[0] which is used in gem5 config scripts is the name of the
config script itself. While it might seem a little odd for the name of
a systemc program to end in .py, it's as arbitrary as any other name,
and generally shouldn't cause a problem. If some other more
sophisticated mechanism for setting argv[0] is necessary, then the user
can write a very slightly more complicated version of this script with
additional logic.
Change-Id: Ifd5d8a02d3cd5db76054151ed6c7a7b1f8495fa8
Reviewed-on: https://gem5-review.googlesource.com/c/16342
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
This config will just run the sc_main function (which must have been
provided in c++ somehow), passing through any of the scripts command
line arguments to sc_main.
Needing to do this sort of thing is common enough that there should be
a canned config which supports it.
Change-Id: I8f88ba4776b9ec919dd8145a58cd856e11ac4e77
Reviewed-on: https://gem5-review.googlesource.com/c/16287
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
g++ complained about comparing an signed int loop counter with the
return value of a size() function. This change changes it to an
unsigned to make g++ happy/quiet.
Change-Id: I28fa79c448465b24d77b5623860f9b991f313561
Reviewed-on: https://gem5-review.googlesource.com/c/16286
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
The loop accidentally used a = when it should have used a +=, meaning
only the sources from the final filter would be used.
Change-Id: Ie066a5f85696f05d9ad3cf61f928b12deb39475b
Reviewed-on: https://gem5-review.googlesource.com/c/16285
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
scons will attempt to use insert() on the value of RPATH when adding in
additional values. That will fail if RPATH is a Literal.
Change-Id: I9da75c6b189f12843a3452cdf92f7b56c0ec340b
Reviewed-on: https://gem5-review.googlesource.com/c/16284
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
Now the indirect branch predictor handles its own GHR instead of getting
the one from the direction predictor.
Also, now the commit method of the indirect predictor is called for every
pending branch on an update, as the indirect predictors may want to update
their interal structures/histories with the information of each branch.
Change-Id: I7053fbea42a53960a3bc1ba32912cc99c160511e
Reviewed-on: https://gem5-review.googlesource.com/c/15318
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
(1) Atomic Memory Operation (AMO)
This patch changes how RISC-V AMO instructions are implemented. For each
AMO, instead of issuing a locking load and an unlocking store request to
downstream memory system, this patch issues a single memory request that
contains a corresponding AtomicOpFunctor to the memory system. Once the
memory system receives the request, the atomic operation is executed in
one single step.
This patch also changes how AMO instructions handle acquire and release
flags in AMOs (e.g., amoadd.aq and amoadd.rl). If an AMO is associated
with an acquire flag, a memory fence is inserted after the AMO completes
as a micro-op. If an AMO is associated with a release flag, another
memory fence is inserted before the AMO executes. If both flags are
specified, the AMO is broken down into a sequence of 3 micro-ops:
mem fence -> atomic RMW -> mem fence. This change makes this AMO
implementation comply to the release consistency model.
(2) Load-Reserved (LR) and Store-Conditional (SC)
Addresses locked by LR instructions are tracked in a stack data
structure. LR instruction pushes its target address to the stack, and SC
instruction pops the top address from the stack. As specified by RISC-V
ISA, a SC fails if its target address does not match with the most recent
LR.
Previously, there was a single stack for all hardware thread contexts.
A shared stack between thread contexts can lead to a infinite sequence
of failed SCs if LRs from other threads keep pushing new addresses to
this stack.
This patch gives each context its private stack to address the problem.
This patch also adds extra memory fence micro-ops to lr/sc to guarantee
a correct execution order of memory instructions with respect to release
consistency model.
Change-Id: I1e95900367c89dd866ba872a5203f63359ac51ae
Reviewed-on: https://gem5-review.googlesource.com/c/8189
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Alec Roelke <ar4jc@virginia.edu>
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.
Atomic memory instruction is treated as a special store instruction in
all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch implements FUTEX_WAKE_OP operation in the futex syscall.
Below is its description:
int futex(int *uaddr, int futex_op, int val,
const struct timespec *timeout,
int *uaddr2, int val3);
This operation was added to support some user-space use cases where more
than one futex must be handled at the same time. The most notable
example is the implementation of pthread_cond_signal(3), which requires
operations on two futexes, the one used to implement the mutex and the
one used in the implementation of the wait queue associated with the
condition variable. FUTEX_WAKE_OP allows such cases to be implemented
without leading to high rates of contention and context switching.
Reference: http://man7.org/linux/man-pages/man2/futex.2.html
Change-Id: I215f3c2a7bdc6374e5dfe06ee721c76933a10f2d
Reviewed-on: https://gem5-review.googlesource.com/c/9630
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch supports FUTEX_CMP_REQUEUE operation. Below is its
description from Linux man page:
futex syscall: int futex(int *uaddr, int futex_op, int val,
const struct timespec *timeout,
int *uaddr2, int val3);
This operation first checks whether the location uaddr still contains
the value val3. If not, the operation fails with the error EAGAIN.
Otherwise, the operation wakes up a maximum of val waiters that are
waiting on the futex at uaddr. If there are more than val waiters, then
the remaining waiters are removed from the wait queue of the source
futex at uaddr and added to the wait queue of the target futex at
uaddr2. The val2 argument specifies an upper limit on the number of
waiters that are requeued to the futex at uaddr2.
Reference: http://man7.org/linux/man-pages/man2/futex.2.html
Change-Id: I6d2ebd19a935b656d19d8342f7ab450c0d2031f4
Reviewed-on: https://gem5-review.googlesource.com/c/9629
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
In case of failure, a syscall returns a negative value encoding the
error code. This patch makes the risc-v implementation returns the
encoded value instead of its absolute value upon a failure of a syscall.
Change-Id: I6032b0337fe1cff5b326dbc6bb3b87a415f03300
Reviewed-on: https://gem5-review.googlesource.com/c/9627
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Alec Roelke <ar4jc@virginia.edu>
When a thread is suspended, all instructions after the suspension need
to be discarded since the thread will take a different execution stream
when it wakes up.
To do that, in MinorCPU, whenever a thread gets suspended, we change the
current execution stream by updating the current branch with
BranchData::SuspendThread reason.
Change-Id: I7cdcda22c1cf6e8ac8db8800b7d9ec052433fdf3
Reviewed-on: https://gem5-review.googlesource.com/c/9626
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@gmail.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
When a thread calls exit_group, in addition to halting the thread
itself, it needs to halt all other threads in its group (i.e., threads
sharing the same thread group ID). This patch enables threads to do
that.
Change-Id: Ib2e158fb27cf98843f177a64a2d643b1bbc94d03
Reviewed-on: https://gem5-review.googlesource.com/c/9623
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch adds support for two operations in futex system call:
FUTEX_WAIT_BITSET and FUTEX_WAKE_BITSET. The two operations are used to
selectively wake up a certain thread waiting on a futex variable.
Basically each thread waiting on a futex variable is associated with a
bitset that is checked when another thread tries to wake up all threads
waiting on the futex variable.
Change-Id: I2300e53b144d8fae226423fa2efb0238c1d93ef9
Reviewed-on: https://gem5-review.googlesource.com/c/9621
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
When a thread executed an exit syscall in SE mode, the thread context
was removed immediately in the same cycle, which left inflight squash
operations and trap event incomplete. The problem happened when a new
thread was assigned to the CPU later. The new thread started with some
incomplete transactions of the previous thread (e.g., squashing). This
problem could cause incorrect execution flow for the new thread (i.e.,
pc was not reset properly at the exit point), deadlock (i.e., some
stage-to-stage signals were not reset) and incorrect rename map between
logical and physical registers.
This patch adds a new state called 'Halting' to the thread context and
defers removing thread context from a CPU until a trap event initiated
by an exit syscall execution is processed. This patch also makes sure
that the removal of a thread context happens after all inflight
transactions of the to-be-removed thread in the pipeline complete.
Change-Id: If7ef1462fb8864e22b45371ee7ae67e2a5ad38b8
Reviewed-on: https://gem5-review.googlesource.com/c/8184
Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Checking if cpsr.mode is equal to MODE_HYP doesn't work for AArch64.
This is because AArch64 is using different modes when in EL2, like EL2T
and EL2H.
This made Virtual Interrupts to be triggered even when executing in EL2
(hypervisor) whereas they should interrupt the scheduled VM only
(Non-Secure EL0 and EL1). This patch is fixing this by using the generic
currEL() helper for getting the exception level, which is working for
both AArch32 and AArch64.
Change-Id: I08640050ef06261f280ba1e63ca9f32c805af845
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/16202
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Differently from ArmSPIs, ArmPPI interrupts need to be instantiated by
giving a ThreadContext pointer in the ArmPPIGen::get() method. Since the
PMU is registering the ThreadContext only at ISA startup time, ArmPPI
generation in deferred until the PMU has a non NULL pointer.
Change-Id: I17daa6f0e355363b8778d707b440cab9f75aaea2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/16204
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
A version of Linux kernel initializes counters before enabling them.
Without this change, gem5 overwrites the value of counter, which causes
incorrect counter values derived by kernel.
Change-Id: If0c515111103018d5f65f74434d7711a67aeaee4
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/16203
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
In every arm platform which is making use of them, mem_regions are
interpreted as a pair of start address and size. However arm
SimpleSystem, which is using VExpress_GEM5_V1, is interpreting them as
start address and end address. This patch is fixing this mismatch.
Change-Id: I0b2a2193cd07fbc5430f233438269a9c7c353df9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Ciro Santilli <ciro.santilli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/16205
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Fence instruction origially had two flags NonSpeculative and
MemBarrier. In O3 model, MemBarrier instructions are inserted
into the instruction queue by the InstructionQueue::insertBarrier (at
src/cpu/o3/iew_impl.hh:1083). Barrier instructions are implicitly
assumed to be non-speculative.
Adding NonSpeculative flag to fence instruction makes it inserted into
the instruction queue twice (at src/cpu/o3/iew_impl.hh:1083 and :1111).
This can lead to a deadlock if both pointers to the instruction are not
cleared from the queue when the instruction retires.
This patch removes NonSpeculative flag from the fence inst.
Change-Id: I26573d12a0b52f43b73c0e51158286dc98d05ea4
Reviewed-on: https://gem5-review.googlesource.com/c/8183
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Alec Roelke <ar4jc@virginia.edu>
When a thread is activated by another thread calling a clone system
call, the child thread's context is initialized in the middle of the
clone system call and before the context is fully initialized.
Therefore, the child thread starts fetching an unitialized PC, which
could lead to a page fault.
This patch adds a pipeline wakeup event that is scheduled later in the
cycle when the thread is activated. This event ensures that the first
fetch only happens after the thread context is fully initialized
(e.g., in case of clone syscall, it is when the parent thread copies
its context over to the child thread).
When a thread first starts or wakes up, input queue to the Fetch2 stage
needs to be drained since the execution flow is likely to change and
previously fetched instructions in the queue may no longer be in the
correct flow. This patch dumps/drains all inputs in the input queue
of a thread context in the Fetch2 stage when the associated thread wakes
up.
Change-Id: Iad970638e435858b7289cd471158cc0afdbbb0e5
Reviewed-on: https://gem5-review.googlesource.com/c/8182
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
Since f2bda876f7, the build system started
adding a length for generated blobs as in:
const std::size_t variable_len = 123;
There were two types of blob files, ones with a header and the ones
without.
The ones with the header, also include the header in the .cc of the blob,
which contains a declaration:
extern const std::size_t variable_len;
Therefore, the ones without header, don't have that extern declaration,
which makes them static according to the C++ standard.
clang then correctly interprets that as problematic due to
-Wunused-const-variable, while GCC does not notice this.
This patch removes the length declaration from the blob files that don't
have the header. Those files currently don't use the length.
Change-Id: I3fc61b28f887fc1015288857328ead2f3b34c6e6
Reviewed-on: https://gem5-review.googlesource.com/c/15955
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
A recent patch add the use of the macro:
CMSG_ALIGN
This macro is not very cross-platform, and needs to be
defined according to the platform.
This patch defines the missing macro on MacOS.
Change-Id: I582f69e652dc060b4532358141179ad6d37eafc7
Reviewed-on: https://gem5-review.googlesource.com/c/16102
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
This implementation is based in the description available in:
Jinchun Kim, Seth H. Pugsley, Paul V. Gratz, A. L. Narasimha Reddy,
Chris Wilkerson, and Zeshan Chishti. 2016.
Path confidence based lookahead prefetching.
In The 49th Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO-49). IEEE Press, Piscataway, NJ, USA, Article 60, 12 pages.
Change-Id: I4b8b54efef48ced7044bd535de9a69bca68d47d9
Reviewed-on: https://gem5-review.googlesource.com/c/14819
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Python 2.7 requires a workaround when wrapping exit objects to
explicitly convert the return of getCode() to int to not confuse
sys.exit. This workaround isn't needed and doesn't work on Python 3
since it doesn't have a separate long integer type.
Change-Id: I57bc3fd8f4699676c046ece8a52baa2796959ffd
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15978
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Some tests are really just a wrapper around a test script in
configs/. Add a helper method to wrap these scripts to make sure they
are executed in a consistent environment. This wrapper sets up a
global environment that is identical to that created by main() when it
executes the script. Unlike the old wrappers, it updates the module
search path to make relative imports work correctly in Python 3.
Change-Id: Ie9f81ec4e2689aa8cf5ecb9fc8025d3534b5c9ca
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15976
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Most Ruby tests assume that the highest frequency in the system under
test is 1GHz and limits the global tick rate to this frequency. This
assumption is broken since the default Ruby configuration scripts
clock the CPU at 2Ghz, which results in warnings and sometimes
incorrect behaviour.
Change-Id: I4b204660862ce3b0ea4a13df42caacd4398fef8c
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15975
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>