This class had been trying to keep all indices within the modulus of the
queue size, and to use all elements in the underlying storage by making
the empty and full conditions alias, differentiated by a bool. To keep
track of the difference between a storage location on one trip around
the queue vs other times around, ie an alias once the indices had
wrapped, it also keep track of a "round" value in both the queue itself,
and any iterators it created.
All this bookkeeping significantly complicated the data structure.
Instead, this change modifies it to keep track of a monotonically
increasing index which is wrapped at the time it's used. Only the head
index and current size need to be tracked in the queue itself, and only
a pointer to the queue and an index need to be tracked in the iterators.
Theoretically, it's possible that this value could overflow eventually
since it increases forever, unlike before where the index wrapped and
was never larger than the queue's capacity. In practice, the type of the
index was changed from a uint32_t to a size_t, probably a 64 bit value
in modern systems, which will hold much larger values. Also, the round
counter and the index values together acted like a smaller than 64 bit
value anyway, since the round counter would overflow after 2^32 times
around a less than 2^32 entry queue.
One minor interface difference is that the head() and tail() values
returned by the queue are no longer pre-wrapped to be modulo the queue's
capacity. As long as consumers don't try to be overly clever and feed in
fixed values, do their own bounds checking, etc., something that would
be cumbersome considering the wrapping nature of the structure, this
shouldn't be an issue.
Also, since external consumers no longer need to worry about wrapping,
since only one of them was used in only one place, and because they
weren't even marked as part of the interface, the modulo helper
functions have been eliminated from the queue. If other code wants to
perform modulo arithmetic for some reason (which the queue no longer
requires) they can accomplish basically the same thing in basically the
same amount of code using normal math.
Also, rather than inherit from std::vector, this change makes the vector
internal to the queue. That prevents methods of the vector that aren't
aware of the circular nature of the structure from leaking out if
they're not overridden or otherwise proactively blocked.
On top of simplifying the implementation, this also makes it perform
*slightly* better. To measure that, I ran the following command:
$ time build/ARM/base/circular_queue.test.opt --gtest_repeat=100000 > /dev/null
and found a few percent improvement in total run time. While this
difference was small and not measured against realistic usage of the
data structure, it was still measurable, and at minimum doesn't seem to
have hurt performance.
Change-Id: Ic2baa28de135be7086fa94579bbec451d69b3b15
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38478
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When a gem5 op is triggered using a KVM MMIO exit event, the PC has
already been advanced beyond the offending instruction. Normally when
a system call or gem5 op is triggered, the PC has not advanced because
the instruction hasn't actually finished executing. This means that if
a gem5 op, and by extension a system call in SE mode, want to advance
the PC to the instruction after the gem5 op, they have to check whether
they were triggered from KVM.
To avoid having to special case these sorts of situations (currently
only in the clone system call), we can have the code which dispatches to
gem5 ops from KVM adjust the next PC so that it points to what the
current PC is. That way the PC can be advanced unconditionally, and will
point to the instruction after the one that triggered the call.
To be fully consistent, we would also need to adjust the current PC.
That would be non-trivial since we'd have to figure out where the
current instruction started, and that may not even be possible to
unambiguously determine given x86's instruction structure. Then we would
also need to restore the original PC to avoid confusing KVM.
Change-Id: I9ef90b2df8e27334dedc25c59eb45757f7220eea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38486
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Inside the code of cloneFunc(…) //syscall_emul.hh
cp->initState(); //line 1483
p->clone(tc, ctc, cp, flags); //line 1484
…
ctc->clearArchRegs(); //line 1503
OS::archClone(flags, p, cp, tc, ctc, newStack, tlsPtr); //line 1505
…
At line 1483, initState() is called and the activateContext() of the
corresponding MinorCPU is eventually called. The actual architecture
clone happens at line 1505 where PC of the new thread could have a
correct value.
In the existing implementation of MinorCPU::activateContext(ThreadID
thread_id), the below line 275 is called
pipeline->wakeupFetch(thread_id);
to start fetching instruction with current value of PC, which is 0x0,
leading to panic “Page table fault when accessing virtual address 0”.
This is because the OS::archClone() is not yet called. So, the below bug
fix handles the wakeup fetch for a thread for two scenarios:
...
if (!threads[thread_id]->getUseForClone())
{ //the thread is not cloned
pipeline->wakeupFetch(thread_id);
} else {//the thread from clone
if (fetchEventWrapper != NULL)
delete fetchEventWrapper;
fetchEventWrapper = new EventFunctionWrapper([this, thread_id]
{pipeline->wakeupFetch(thread_id);}, "wakeupFetch");
schedule(*fetchEventWrapper, clockEdge(Cycles(0)));
}
...
If a thread is not cloned, pipeline->wakeupFetch() is called
immediately.
For the cloned thread, the above bug fix delays the execution of
pipeline->wakeupFetch()
after the OS::archClone is done. ThreadContext::getUseForClone() return
true if a thread is cloned.
A member variable fetchEventWrapper is added to MinorCPU class for
delayed fetch event.
A member variable useForClone and its corresponding get/set methods are
added to ThreadContext class. This approach allows future reuse of this
useForClone variable by other CPU models if needed and also avoid lots
of changes resulted by modifying parameters of activateContext () and
activate() which are defined as override.
Inside the syscall cloneFunc, the useForClone member of a ThreadContext
object is set via its set method right before Process's initState() is
called, shown as below.
ctc->setUseForClone(true);
cp->initState();
p->clone(tc, ctc, cp, flags);
A few previously failed RISC-V ASM tests have been open in tests.py file
after the bug fix works.
JIRA issue: https://gem5.atlassian.net/browse/GEM5-374
Change-Id: Ibffe46522e2617443d29f49df180692c54830f14
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37315
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This will eventually let subclasses provide their own appropriately
sized storage for these indexes. By using a pointer to member instead of
a regular pointer, we ensure that even if the StaticInst is copied/moved
somewhere, it will still find its indexes correctly, without any
additional performance overhead or maintenance.
Unfortunately C++ has decided that arrays with known bounds are not
convertible/compatible with arrays with unknown bounds. I've found at
least two standards proposals in various stages of acceptance which say
that that's dumb and they should change that (because it's dumb and they
should change that), but in the mean time we can get everything to
compile by using the reinterpret_cast hammer. While this is
*technically* undefined behavior, it's basically not and should be
pretty safe.
Change-Id: Id747b0cf68d1a0b4809ebb66a32472187110d7d8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36876
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This merge commit also reverts the version info back to
'DEVELOP-FOR-V20.2' for the develop branch.
Change-Id: If6fd326cc23edf2aeaa67353d4d3fed573e9ddd6
Previously, ThreadStateStats uses ThreadState::threadId() to
determine the name of the stats. However, in the ThreadState
constructor, ThreadStateStats is initialized before ThreadState
is intialized. As a result, the name of ThreadStateStats has
a wrong ThreadID.
This commit uses ThreadID instead of ThreadState to determine
the name of the stats.
This causes a name collision between ThreadStateStats and
ExecContextStats as both have the name of "thread_[tid]".
Ideally, those stats should be merged to the BaseSimpleCPU.
However, both ThreadStateStats and ExecContextStats have
a stat named numInsts. So, for now, ExecContextStats will
have a name of "exec_context.thread_[tid]", while ThreadStateStats
keeps its name.
Change-Id: If9a21549f98bd6e3ce6dc29bdf183e8fd5f51a67
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37455
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
If the memory system can provide a back door to memory, store that, and
use it for subsequent accesses to the range it covers. For now, this
covers only fetch. That's because fetch will generally happen more than
loads and stores, and because it's relatively simple to implement since
we can ignore atomic operations, etc.
Some limitted benchmarking suggests that this speeds up x86 linux boot
by about 20%, although my modifications to the config to remove caching
(which blocks the back door mechanism) also made gem5 crash, so it's
hard to say for sure if that's a valid result. The crash happened in the
same way before and after, so it's probably at least relatively
representative.
While this gives a pretty substantial performance boost, it will prevent
statistics from being collected at the memory, or on intermediate objects
in the interconnect like the bus. That is to be expected with this
memory mode, however.
Change-Id: I73f73017e454300fd4d61f58462eb4ec719b8d85
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36979
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There were accessors for reading these indexes, but they were not
consistently used. This change makes them private to StaticInst, and
changes places that were accessing them directly to instead use the
accessors. New accessors are added for code generated by the ISA parser
and some ARM code to set the indexes without accessing them directly.
By forcing these values to be behind accessors, it will be much simpler
to change how those values are stored and retrieved.
Change-Id: Icca80023d7f89e29504fac6b194881f88aedeec2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36875
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch adds the GPU protocol tester that uses data-race-free
operation to discover bugs in GPU protocols including GPU_VIPER. For
more information please see the following paper and the README:
T. Ta, X. Zhang, A. Gutierrez and B. M. Beckmann, "Autonomous
Data-Race-Free GPU Testing," 2019 IEEE International Symposium on
Workload Characterization (IISWC), Orlando, FL, USA, 2019, pp. 81-92,
doi: 10.1109/IISWC47752.2019.9042019.
Change-Id: Ic9939d131a930d1e7014ed0290601140bdd1499f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32855
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:
const Params &
Params &
Params *
const Params *
Params const*
This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).
Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
On the O3 CPU, when the number of threads on the CPU (SMT) is too low to
hold all the old style CPU workload items, then it would increase the
number of threads to match. There are three problems with this.
1. This behavior was only implemented on O3.
2. It could silently hide a bug in the config where the number of
workload items was accidentally too big.
3. It makes the DerivO3CPUParams struct tamper with itself in the
create() method, which means not even config.ini will accurately
reflect the actual config of the system.
Change-Id: I0aab70d4b98093f7f14156ca437e763f031049ab
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35937
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently, when the numThreads parameter is set to something larger than
1 in full system mode, the O3 CPU will just silently change it back down
again to 1. This could be confusing to the user since it won't be
immediately apparent, even when looking at config.ini, that their config
isn't being respected.
This change moves that check into the CPU constructor, where CPU
behavior probably should be rather than the create() method which should
just build the object, and also turns it into an error.
Change-Id: I627ff8702b5e8aaad8839aa8d52524690be25619
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35936
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The fetch policy is only meaningful for SMT simulations. The
"SingleThreaded" value is a placeholder which is the default, and is
only supposed to be used in non-SMT simulations.
Rather than have this enum value and have special checks for it in
various places in O3, we can just eliminate it and set the default,
which is still only meaningful in SMT simulations, be an SMT fetch
policy.
The DerivO3CPUParams::create() function would forcefully change the
the fetch policy from "SingleThreaded" to "RoundRobin" anyway if there
were more than one thread, so that can be the actual default instead of
the shadow effective default.
Change-Id: I458fda00b5bcc246b0957e6c937eab0c5b4563c3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35935
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The byteEnable variable is used for masking bytes in a memory request.
The default behaviour is to provide from the ExecContext to the CPU
(and then to the LSQ) an empty vector, which is the same as providing
a vector where every element is true.
Such vectors basically mean: do not mask any byte in the memory request.
This behaviour adds more complexity to the downstream LSQs, which now
have to distinguish between an empty and non-empty byteEnable.
This patch is simplifying things by transforming an empty vector into
a all true one, making sure the CPUs are always receiving a non empty
byteEnable.
JIRA: https://gem5.atlassian.net/browse/GEM5-196
Change-Id: I1d1cecd86ed64c53a314ed700f28810d76c195c3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23285
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
Commits 02745afd and f9b4e32 introduced a mechanism for creating checkpoint
objects for hardware transactional memory (HTM) and Arm TME. Because the
checkpoint object also contains the local UID of a transaction, it is
needed before any architectural checkpointing takes places. This caused
segfaults when running HTM codes.
This commit allows ISAs to allocate a checkpoint once at the beginning
of simulation. In order to do that we need to remove the validity check
assertion; the cpt will become valid only after a first successfull
transaction start
Change-Id: I233d01805f8ab655131ed8cd6404950a2bf6fbc7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35015
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change replaces the __attribute__ syntax with the now standard [[]]
syntax. It also reorganizes compiler.hh so that all special macros have
some explanatory text saying what they do, and each attribute which has a
standard version can use that if available and what version of c++ it's
standard in is put in a comment.
Also, the requirements as far as where you put [[]] style attributes are
a little more strict than the old school __attribute__ style. The use of
the attribute macros was updated to fit these new, more strict
requirements.
Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Before this commit, ExecSymbol would show only the symbol and no address:
0: system.cpu: A0 T0 : @_kernel_flags_le_lo32+6 : mrs x0, currentel
After this commit, it shows the symbol in addition to the address:
0: system.cpu: A0 T0 : 0x10 @_kernel_flags_le_lo32+6 : mrs x0, currentel
Change-Id: I665802f50ce9aeac6bb9e174b5dd06196e757c60
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35077
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This macro probably would have been defined to "return" in some cases,
to be put after a call to a function that doesn't return so that the
compiler wouldn't think control would reach the end of a non-void
function. It was only ever defined to expand to nothing, and now that
[[noreturn]] is a standard attribute, it should never be needed going
forward.
Change-Id: I37625eab72deeaede77f9347116b9fddd75febf7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35217
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>