Before the commit, the bootloader had a hardcoded entry point that it
would jump to.
However, the Linux kernel arm64 v5.8 forced us to change the kernel
entry point because the required memory alignment has changed at:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?h=v5.8&id=cfa7ede20f133cc81cef01dc3a516dda3a9721ee
Therefore the only way to have a single bootloader that boots both
pre-v5.8 and post-v5.8 kernels is to pass that information from gem5
to the bootloader, which we do in this patch via registers.
This approach was already used by the 32-bit bootloader, which passed
that value via r3, and we try to use the same register x3 in 64-bit.
Since we are now passing this information, the this patch also removes
the hardcoding of DTB and cpu-release-addr, and also passes those
values via registers.
We store the cpu-release-addr in x5 as that value appears to have a
function similar to flags_addr, which is used only in 32-bit arm and
gets stored in r5.
This commit renames atags_addr to dtb_addr, since both are mutually
exclusive, and serve a similar purpose, DTB being the newer recommended
approach.
Similarly, flags_addr is renamed to cpu_release_addr, and it is moved
from ArmSystem into ArmFsWorkload, since it is not an intrinsic system
property, and should be together with dtb_addr instead.
Before this commit, flags_addr was being set from FSConfig.py and
configs/example/arm/devices.py to self.realview.realview_io.pio_addr
+ 0x30. This commit moves that logic into RealView.py instead, and
sets the flags address 8 bytes before the start of the DTB address.
JIRA: https://gem5.atlassian.net/browse/GEM5-787
Change-Id: If70bea9690be04b84e6040e256a9b03e46710e10
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35076
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These functions return iterators which are inconsistent with the usage
model for this type. It should be accessed using the peek, push, and pop
methods and not iterators. If you need a class with iterators which is
oriented around accessing individual elements at a time, the
CircularQueue type is likely a better choice.
Change-Id: I9f37eab12e490b63d870d378a91f601dad353f25
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38998
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This is implemented similary to the NonCachingSimpleCPU, except that
both the normal atomic and noncaching atomic behaviors are implemented
by the same class. The sendDma function now dispatches to a method which
implements one or the other behavior since that function was getting too
big and complex.
Change-Id: I7972971ef41d1373424e587cf67c8444d50de748
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38719
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Instead of generating all of the DMA packets when a request is
initiated, keep track of the necessary properties and generate them as
needed.
The primary benefit of this approach is that if we don't actually need
packets, for instance if we can satisfy the request using a memory
backdoor, we can just skip them. Otherwise we'll have wasted time
creating them, and then have to clean them up.
Change-Id: I04d399fb7bce1ff9a44979c311be356baf2db555
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38717
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
CircularQueue provides iterators which make it easier for users to
interact with it and helps abstract its internal state, but at the same
time it prevents standard algorithms like std::copy from recognizing
opportunities to use bulk copies to speed up execution. It also hides
the seams when wrapping around the buffer happens which std::copy
wouldn't know how to handle.
CircleBuf seems to be intended as a simpler type which doesn't hold
complex entries like the CircularQueue does, and instead just acts as a
wrap around buffer, like the name suggests. This change reimplements it
to not use CircularQueue, and to instead use an underlying vector.
The way internal indexing is handled CircularQueue was simplified
recently, and using the same scheme here means that this code is
actually not much more verbose than it was before. It also intrinsically
handles indexing and bulk accesses, and uses std::copy_n in a way that
lets it recognize and take advantage of contiguous storage and bulk
copies.
Change-Id: I78e7cfe174c52f60c95c81e5cd3d71c6052d4d41
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38896
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabe.black@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This method lets you stretch the current chunk, if you want to skip over
some of the upcoming chunks since you've already handled their bytes.
For instance, if you were iterating over pages in a range of virtual
addresses, you might be able to handle multiple page sized chunks at
once if they were represented by a single large sized page table entry.
This mechanism would let you move past all the pages you had just
handled without having to walk through them all one by one.
Change-Id: I7d962f548947b77f0aa1b725036dbcc9e1b28659
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38718
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
FLAT instructions used to decrement lgkm count on execute, while the
GCN3 ISA specifies that lgkm count should be decremented on data being
returned or data being written.
This patch changes it so that lgkm is decremented after initiateAcc (for
stores) and after completeAcc (for loads) to better reflect the ISA
definition.
This fixes a bug where waitcnts would be satisfied even though the
memory access wasn't completed, which lead to instructions using the
wrong data.
Change-Id: I596cb031af9cda8d47a1b5e146e4a4ffd793d36c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38696
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This function had a comment claiming that returning an arbitrary request
from the call was necessary for page table walker statistics, but
looking at the actual code, the return type was never used. Also
returning whatever the last request happens to be seems arbitrary, and a
bad boundary for modularization. The page table walker should not depend
on the internal implementation of the DMA port.
Change-Id: I00281fbaf6aeb85b15baf54f3d4a23ca1ac72b8b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38716
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
ARM reaches in and pads out the source register index list behind the
parser's back to force dest regs to also be sources in case an
instruction fails predication and needs to forward the original register
values. It shouldn't be hacking up these values in that way, but since
it is, this will let it continue to do so while still fitting in the new
system where each instruction allocates its src/dest reg index arrays to
size.
Change-Id: Ia296be9f63123f18f6cdc0d3bb1314d33e759b3a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38380
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This class had been trying to keep all indices within the modulus of the
queue size, and to use all elements in the underlying storage by making
the empty and full conditions alias, differentiated by a bool. To keep
track of the difference between a storage location on one trip around
the queue vs other times around, ie an alias once the indices had
wrapped, it also keep track of a "round" value in both the queue itself,
and any iterators it created.
All this bookkeeping significantly complicated the data structure.
Instead, this change modifies it to keep track of a monotonically
increasing index which is wrapped at the time it's used. Only the head
index and current size need to be tracked in the queue itself, and only
a pointer to the queue and an index need to be tracked in the iterators.
Theoretically, it's possible that this value could overflow eventually
since it increases forever, unlike before where the index wrapped and
was never larger than the queue's capacity. In practice, the type of the
index was changed from a uint32_t to a size_t, probably a 64 bit value
in modern systems, which will hold much larger values. Also, the round
counter and the index values together acted like a smaller than 64 bit
value anyway, since the round counter would overflow after 2^32 times
around a less than 2^32 entry queue.
One minor interface difference is that the head() and tail() values
returned by the queue are no longer pre-wrapped to be modulo the queue's
capacity. As long as consumers don't try to be overly clever and feed in
fixed values, do their own bounds checking, etc., something that would
be cumbersome considering the wrapping nature of the structure, this
shouldn't be an issue.
Also, since external consumers no longer need to worry about wrapping,
since only one of them was used in only one place, and because they
weren't even marked as part of the interface, the modulo helper
functions have been eliminated from the queue. If other code wants to
perform modulo arithmetic for some reason (which the queue no longer
requires) they can accomplish basically the same thing in basically the
same amount of code using normal math.
Also, rather than inherit from std::vector, this change makes the vector
internal to the queue. That prevents methods of the vector that aren't
aware of the circular nature of the structure from leaking out if
they're not overridden or otherwise proactively blocked.
On top of simplifying the implementation, this also makes it perform
*slightly* better. To measure that, I ran the following command:
$ time build/ARM/base/circular_queue.test.opt --gtest_repeat=100000 > /dev/null
and found a few percent improvement in total run time. While this
difference was small and not measured against realistic usage of the
data structure, it was still measurable, and at minimum doesn't seem to
have hurt performance.
Change-Id: Ic2baa28de135be7086fa94579bbec451d69b3b15
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38478
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit makes move stats from several classes in mem/ruby
to corresponding Stats::Group's.
For ruby's Profiler, additional changes are made: there are stats that
are profiled for each of RequestType, for each of MachineType, and for
each of combinations of RequestType and MachineType. The current naming
scheme is ...<stat_name>.<request_type_name>.<machine_type_name>. To make
it easier for stats parser to know whether the stat is of RequestType, or
is of MachineType, or is of (RequestType, MachineType), a prefix is added
as follows,
...<meta>.<stat_name>.<request_type_name>.<machine_type_name>
where <meta> is one of {RequestType, MachineType, RequestTypeMachineType}.
Another point of using this naming scheme is that the parser doesn't
need to know all of RequestType and MachineType.
Change-Id: I8b8bdd771c7798954f984d416f521e8eb42d01ed
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36478
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
SimObject outputs a warning when its parent is specified more than
once. The cause is most likely that there is unexpected param
specified in the constructor called in the Python interface.
This commit adds a note about this probable cause of this potential
error to the warning message.
Change-Id: I9b6bf5d5fb0c77bfdad5fde42e88f814e8a4b72b
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38359
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
To limit the number of license slots used by SCons when building fast
model components, the fastmodel SConscript set up a group of nodes
which are attached to each simgen run using the SCons SideEffect
method using one of the library files it generates.
To create each unique node, the SCons Value() method was used, passing
it the counter for the loop. In at least version 4 of SCons, what this
ended up doing was setting that library file as a source for each of
the Value() nodes it corresponds to.
That doesn't *seem* like a problem, but then when creating config
include files, files which expose SCons configuration values to C++,
they also create Value() nodes using the value of the config variable.
In cases where that variable is boolean, the value might be 0 or 1.
The result was that the config header depended on Value(0) (for
instance), and then Value(0) depended on a collection of static library
files.
When scons tried to determine whether the config file was up to date,
it tried to check if if its sources had changed. It would check
Value(0), and then Value(0) would try to compute a checksum for its own
source. To do that, it seems to assume that the value can be
interpreted as a string and tries to decode it as utf8. Since the
library is a binary file, that would fail and break the build with a
cryptic message from within the guts of SCons.
To address this, this change replaces the loop index with a call to
object(). Each instance created in that way will be different from
every other, and there will be no way (purposefully or otherwise) to
create a collision with it when creating Value() nodes for some other
purpose.
Change-Id: I56bc842ae66b8cb36d3dcbc25796b708254d6982
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38617
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Reviewed-by: Ahbong Chang <cwahbong@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These values were (seemingly) arbitrarily changed from the original,
non-KVM settings, and no longer matched the comments which were also
copied over. These two bits enable alignment checking on memory accesses
(not normally used on x86), and whether kernel code can write to read
only pages.
Change-Id: I48e560e448e4849607f12e9336d1ab0458ad9407
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38536
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Implementation of a generic frequency-based sampling
compressor. The compressor goes through a sampling stage,
where no compression is done, and the values are simply
sampled for their frequencies. Then, after enough samples
have been taken, the compressor starts generating
compressed data.
Compression works by comparing chunks to the table of
most frequent values. In theory, a chunk that is present
in the frequency table has its value replaced by the
index of its respective entry in the table. In practice,
the value itself is stored because there is no straight-
forward way to acquire an index from an entry.
Finally, the index can be encoded so that the values
with highest frequency have smaller codeword representation.
Its Huffman coupling can be used similar to the approach
taken in "SC 2 : A Statistical Compression Cache Scheme".
Change-Id: Iae0ebda08e8c08f3b62930fd0fb7e818fd0d141f
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37335
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
For some segments, there are two base registers. One is the
architecturally visible base, and the other is the effective base used
when actually referencing memory relative to that segment. The process
initialization code was setting the architecturally visible base,
presumably because that's the value used by KVM, but was setting the
effective base to zero.
Change-Id: I06e079f24fa63f0051268437bf00c14578f62612
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38488
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When a gem5 op is triggered using a KVM MMIO exit event, the PC has
already been advanced beyond the offending instruction. Normally when
a system call or gem5 op is triggered, the PC has not advanced because
the instruction hasn't actually finished executing. This means that if
a gem5 op, and by extension a system call in SE mode, want to advance
the PC to the instruction after the gem5 op, they have to check whether
they were triggered from KVM.
To avoid having to special case these sorts of situations (currently
only in the clone system call), we can have the code which dispatches to
gem5 ops from KVM adjust the next PC so that it points to what the
current PC is. That way the PC can be advanced unconditionally, and will
point to the instruction after the one that triggered the call.
To be fully consistent, we would also need to adjust the current PC.
That would be non-trivial since we'd have to figure out where the
current instruction started, and that may not even be possible to
unambiguously determine given x86's instruction structure. Then we would
also need to restore the original PC to avoid confusing KVM.
Change-Id: I9ef90b2df8e27334dedc25c59eb45757f7220eea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38486
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These pseudo insts are less useful outside of full system, but they
should all still work. Removing this check makes it possible to, for
instance, test them in syscall emulation mode, and removes another
difference between the two styles of simulation.
Change-Id: Ia7d29bfc6f7c5c236045d151930fc171a6966799
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38485
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>