Inside the code of cloneFunc(…) //syscall_emul.hh
cp->initState(); //line 1483
p->clone(tc, ctc, cp, flags); //line 1484
…
ctc->clearArchRegs(); //line 1503
OS::archClone(flags, p, cp, tc, ctc, newStack, tlsPtr); //line 1505
…
At line 1483, initState() is called and the activateContext() of the
corresponding MinorCPU is eventually called. The actual architecture
clone happens at line 1505 where PC of the new thread could have a
correct value.
In the existing implementation of MinorCPU::activateContext(ThreadID
thread_id), the below line 275 is called
pipeline->wakeupFetch(thread_id);
to start fetching instruction with current value of PC, which is 0x0,
leading to panic “Page table fault when accessing virtual address 0”.
This is because the OS::archClone() is not yet called. So, the below bug
fix handles the wakeup fetch for a thread for two scenarios:
...
if (!threads[thread_id]->getUseForClone())
{ //the thread is not cloned
pipeline->wakeupFetch(thread_id);
} else {//the thread from clone
if (fetchEventWrapper != NULL)
delete fetchEventWrapper;
fetchEventWrapper = new EventFunctionWrapper([this, thread_id]
{pipeline->wakeupFetch(thread_id);}, "wakeupFetch");
schedule(*fetchEventWrapper, clockEdge(Cycles(0)));
}
...
If a thread is not cloned, pipeline->wakeupFetch() is called
immediately.
For the cloned thread, the above bug fix delays the execution of
pipeline->wakeupFetch()
after the OS::archClone is done. ThreadContext::getUseForClone() return
true if a thread is cloned.
A member variable fetchEventWrapper is added to MinorCPU class for
delayed fetch event.
A member variable useForClone and its corresponding get/set methods are
added to ThreadContext class. This approach allows future reuse of this
useForClone variable by other CPU models if needed and also avoid lots
of changes resulted by modifying parameters of activateContext () and
activate() which are defined as override.
Inside the syscall cloneFunc, the useForClone member of a ThreadContext
object is set via its set method right before Process's initState() is
called, shown as below.
ctc->setUseForClone(true);
cp->initState();
p->clone(tc, ctc, cp, flags);
A few previously failed RISC-V ASM tests have been open in tests.py file
after the bug fix works.
JIRA issue: https://gem5.atlassian.net/browse/GEM5-374
Change-Id: Ibffe46522e2617443d29f49df180692c54830f14
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37315
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This will eventually let subclasses provide their own appropriately
sized storage for these indexes. By using a pointer to member instead of
a regular pointer, we ensure that even if the StaticInst is copied/moved
somewhere, it will still find its indexes correctly, without any
additional performance overhead or maintenance.
Unfortunately C++ has decided that arrays with known bounds are not
convertible/compatible with arrays with unknown bounds. I've found at
least two standards proposals in various stages of acceptance which say
that that's dumb and they should change that (because it's dumb and they
should change that), but in the mean time we can get everything to
compile by using the reinterpret_cast hammer. While this is
*technically* undefined behavior, it's basically not and should be
pretty safe.
Change-Id: Id747b0cf68d1a0b4809ebb66a32472187110d7d8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36876
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Certain instructions (some atomics and buffer_wbinvl1_vol) deadlock
in the coalescer, where sendTimingReq fails, fails a retry, and then
never retries again.
This fix sets m_cache_inv_pkt to null before calling
completeHitCallback(), as that allows the failed packets to be retried
again.
Change-Id: I4a51c741360f385f8b4c3f2a31a9410f18e095d9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37477
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
If an `AddrRange` is not interleaved, return the input address in
`removeIntlvBits` and `addIntlvBits` to prevent undefined behavior. It
allows to use these methods in all cases without having to check
manually whether the range is interleaved.
Change-Id: Ic6ac8c4e52b09417bc41aa9380a24319c34e0b35
Signed-off-by: Isaac Sánchez Barrera <isaac.sanchez@bsc.es>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37617
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
HSA packet processor will now accept and process agent packets.
Type field in packet is command type.
For now:
AgentCmd::Nop = 0
AgentCmd::Steal = 1
Steal command steals the completion signal for a running kernel.
This enables a benchmark to use hsa primitives to send an agent
packet to steal the signal, then wait on that signal.
Minimal working example to be added in gem5-resources.
Change-Id: I37f8a4b7ea1780b471559aecbf4af1050353b0b1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37015
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These files are used to generate stubs for calling across GRPC
protocols, an RPC mechanism which is based around the protocol buffer
system.
The support for these files is heavily based on and calls into the
existing protobuf file support, but with the extra step which generates
the additional .grpc.pb.cc and .grpc.pb.h files.
Change-Id: I89022928c08aa9f7ed024b7380ddcc54ca75b55e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37277
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There are several benefits to using a Builder. First, the action we're
executing is shared between all uses of the Builder. The number of
times this particular builder is called is small, but it should still
be a little more efficient.
Second, we can use SCons's emitter mechanism to generate the .pb.cc and
.pb.h target files in a little more general way.
Also, this change adds a Scanner for .proto files which will scan them
for imports and let SCons manage those implicit dependencies properly.
The scanner is a bit simplistic as described in a comment in the
source, but should work pretty well in practice with reasonably
formatted files, and in particular some files I'm working with that
include imports.
Change-Id: Iaf2498e61133d6f713d6ccaf199422b882c5894f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37276
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The ProtoBuf support in src/SConscript was split into two parts, one
where the ProtoBuf sources were declared, and the other where scons was
told how to buld the .cc and .hh files and the .cc was added to the
build.
As far as I can tell, there was no real reason to have things split up
like that, at least not currently. This change moves everything into
the ProtoBuf class definition, and this should behave the same as
before but be a little easier to understand and maintain.
Change-Id: I02320f50ece53d90c14b5062bd6b1167210f46c3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37275
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There were two issues with how paths were handled for these files.
1. The code in the ProtoBuf class would drop the subdirectory part of
the path name when generating the name of the .cc and .h files the
protoc compiler would output. Since protoc wouldn't generate files
where scons expected, it would fail when it tried to build the .cc.
2. protoc will use the --proto_path and --cpp_out settings to figure
out what path to use for generated files. It will remove the
--proto_path prefix it found the .proto file with from the files path,
and then add the rest to the --cpp_out prefix.
The input files should come from the build directory using symlinks
set up by scons, and the output files should end up alongside them.
That means the --proto_path setting should be the build directory, and
so should --cpp_out. That's fortunately simpler than what was there
before, since it doesn't depend on what the source or targets are.
Change-Id: I69692d2fe3813011982f0c1c9824589a132f93ed
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37218
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The function was defined in a .cc file but marked as inline. gcc seems
to often figure out what it should do, but in clang it doesn't export
the function (since it's marked as inline), and during linking external
references, which don't have a local copy since it's not defined in the
.hh file, will fail.
This failure looks particularly odd because the funciton is virtual,
and so the failure is reported as being unable to compose the vtable
in places where the object is constructed, relatively obscure code
which is generated by the build system and obscured by templates from
an external code base (pybind11).
Change-Id: Ib51aefbf9005e4ca8dfebef32c5def472175f115
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37436
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The compression factor of a block is measured according to the maximum
achievable compression ratio.
For example, if up to 4 blocks can co-allocate in a superblock, and
a cache line has 512 bits, the possible compression factors are 1
(uncompressed, <=512 bits), 2 (compressed, <=256 bits), 4 (compressed,
<=128 bits).
This is an approach similar to the one described in "Yet Another
Compressed Cache", by Sardashti et al.
Change-Id: I52ef36989f3eeef6fc8890132a57f995ef9c5258
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36581
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When setting the size of a compressed block, its compressibility
needs to be recalculated based on that, so move such functionality
to be done after the block has been inserted, within setSizeBits.
As a side effect, insertBlock does not need to be overridden
anymore.
Change-Id: I608f876cd2110ac5e394ffad5b29941ba458ba91
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36580
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Data contractions happen when a block passes from a less compressed
(e.g., uncompressed) to a more compressed (e.g., compressed) state.
Some compaction methods enforce that a block can only be allocated
in a location matches an exact compression factor, thus on data
contractions such blocks must be moved to another location, or
they must be padded to fake a bigger size.
For compaction methods that do not have that limitation, performance
can be improved if the contracted block is moved to co-allocate with
another existing entry, since it frees up an entry.
Change-Id: I302bc561b897f9d3ce1426331fe4b5c2df76f4b5
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36578
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When searching for victims of a data expansion a simple approach to
make room for the expanded block is to evict every co-allocatable
block. This, however, ignores replacement policies and tends to be
inefficient. Besides, some cache compaction policies do not allow
blocks that changed their compression ratio to be allocated in the
same location (e.g., Skewed Compressed Caches), so they must be
moved elsewhere.
The replacement policy approach asks the replacement policy which
block(s) would be the best to evict in order to make room for the
expanded block. The other approach, on the other hand, simply evicts
all co-allocated entries. In the case the replacement policy selects
the superblock of the block being expanded, we must make sure the
latter is not evicted/moved by mistake.
This patch also allows the user to select which approach they would
like to use.
Change-Id: Iae57cf26dac7218c51ff0169a5cfcf3d6f8ea28a
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36577
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Some cache techniques may need to move a block's metadata information
into another block. This must have some limitations to avoid mistakes:
- The destination entry must be invalid, otherwise the replacement
policy steps would be skipped.
- The source entry must be valid, otherwise there would be no point
in moving their metadata contents.
- The entries locations (set, way, offset...) must not be moved, since
they are fixed. The same principle is applied to the location specific
variables, such as the replacement pointer
Why it would be used:
For example, when using compression, and a block goes from uncompressed
to compressed state due to an overwrite, after the tag lookup
(sequential access) it can be decided whether to store the new data in
the old location, or, since we might have already found the block's co-
allocatable blocks, move it to co-allocate.
Other examples of techniques that could use this functionality are
Skewed Compressed Caches, and ZCaches.
Change-Id: I96e4f8cc8c992c4b01f315251d1a75d51c28692c
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36575
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
If the memory system can provide a back door to memory, store that, and
use it for subsequent accesses to the range it covers. For now, this
covers only fetch. That's because fetch will generally happen more than
loads and stores, and because it's relatively simple to implement since
we can ignore atomic operations, etc.
Some limitted benchmarking suggests that this speeds up x86 linux boot
by about 20%, although my modifications to the config to remove caching
(which blocks the back door mechanism) also made gem5 crash, so it's
hard to say for sure if that's a valid result. The crash happened in the
same way before and after, so it's probably at least relatively
representative.
While this gives a pretty substantial performance boost, it will prevent
statistics from being collected at the memory, or on intermediate objects
in the interconnect like the bus. That is to be expected with this
memory mode, however.
Change-Id: I73f73017e454300fd4d61f58462eb4ec719b8d85
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36979
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The Python version installed in the Dockerfile for GCN3 by apt-get is
too old to build gem5. This bumps the version to the most recent Python
to avoid needing to update this file too much.
Python 3.9 is install via PPA since it is not available in the official
Ubuntu 16.04 repository. Likewise, pip is installed from "source" as it
is not available for Python 3.9 in from neither the PPA nor Ubuntu.
Change-Id: Ia919f31cf9c9063e1df091cea15590526715739b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37219
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Daniel Gerzhoy <daniel.gerzhoy@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The methods `AddrRange::removeIntlvBits(Addr)` and
`AddrRange::addIntlvBits(Addr)` should be the inverse of one another,
but the latter did not insert the blanks for filling the removed bits in
the correct positions. Since the masks are ordered increasingly by the
position of the least significant bit of each mask, the lowest bit that
has to be inserted at each iteration is always `intlv_bit`, not needing
to be shifted to the left or right. The bits that need to be copied
from the input address are `intlv_bit-1..0` at each iteration.
The test `AddrRangeTest.AddRemoveInterleavBitsAcrossRange` has been
updated have masks below bit 12, making the old code not pass the test.
A new `AddrRangeTest.AddRemoveInterleavBitsAcrossContiguousRange` test
has been added to include a case in which the previous code fails. The
corrected code passes both tests.
This function is not used anywhere other than the tests and the class
`ChannelAddr`. However, it is needed to efficiently implement
interleaved caches in the classic mode.
Change-Id: I7d626a1f6ecf09a230fc18810d2dad2104d1a865
Signed-off-by: Isaac Sánchez Barrera <isaac.sanchez@bsc.es>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37175
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>