The is a bug in the GPUCoalescer which occurs in the following
situation:
1) An instruction crosses a page boundary causing multiple TLB requests
to be sent.
2) The TLB responses arrive at different times, causing the vector
memory requests to be sent at different times.
3) The first vector memory request completes before the second vector
memory request arrives at the coalescer.
This caused the coalescer to consider the instruction sequence number
done and return its token. Then the second request would arrive and
complete sending back another token. Eventually this increases the token
count beyond the maximum tripping an assert.
This change keeps track of the number of per-lane requests which are
expected to be sent in the vector memory request by looking at the exec
mask of the instruction. The token is not returned until the expected
number of per-lane requests have been coalesced. This fixes "#7" in the
list of issues in JIRA-300. There are also style fixes for local
variables in code nearby the changes in this CL.
Change-Id: I152fd9397920ad82ba6079112908387e71ff3cce
JIRA: https://gem5.atlassian.net/browse/GEM5-300
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35176
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When the TokenPort was moved from the GCN3 staging branch to develop the
TokenPort was changed from being the port connecting the ComputeUnit to
Ruby's vector memory port to a sideband port which inhibits requests to
Ruby's vector memory port. As such, it needs to be explicitly connected
as a new port. This changes the getPort method in ComputeUnit to be
aware of the port as well as modifying the example config to connect to
TCPs.
The iteration to connect in the config file was modified since it was
not properly connecting to TCPs each time and Ruby.py does not
explicitly return a list of each MachineType.
Change-Id: Ia70a6756b2af54d95e94d19bec5d8aadd3c2d5c0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35096
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The byteEnable variable is used for masking bytes in a memory request.
The default behaviour is to provide from the ExecContext to the CPU
(and then to the LSQ) an empty vector, which is the same as providing
a vector where every element is true.
Such vectors basically mean: do not mask any byte in the memory request.
This behaviour adds more complexity to the downstream LSQs, which now
have to distinguish between an empty and non-empty byteEnable.
This patch is simplifying things by transforming an empty vector into
a all true one, making sure the CPUs are always receiving a non empty
byteEnable.
JIRA: https://gem5.atlassian.net/browse/GEM5-196
Change-Id: I1d1cecd86ed64c53a314ed700f28810d76c195c3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23285
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
With some exceptions (in arm/x86) the standard memory read/write interface
for instructions relies upon the helper functions in
src/arch/generic/memhelpers.hh
which wrap the ExecContext interface.
(readMem, writeMem...)
Those helpers rely on the source/destination data to be provided (as
expected) but not on the size of the transaction. The latter gets
evaluated via the host size of the source/destination data
(sizeof(MemT)).
For this reason some instructions, which are instead using an
incompatible MemT data (as an example, a SIMD operation loading data in
an array of integers), make direct use of the ExecContext interface,
which is simply requesting for a pointer and a number of bytes.
Some other instructions are using the ExecContext interface since the
helpers do not accept a byteEnable argument.
This patch is adding some helpers to address these issues. The idea is
to deprecate direct usage of the ExecContext APIs.
These new wrappers do not work with the type detection mechanism
to evaluate the number of bytes we are accessing.
JIRA: https://gem5.atlassian.net/browse/GEM5-196
Change-Id: I5b822d278bdf325a68a01aa1861b6487c6628245
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23527
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
In efforts to reduce storage costs and download times, the images hosted
by us have been gzipped. The TestLib framework has therefore been
extended to decompress gzipped files after download.
The x86-boot-tests are, at present, the only tests which use the gem5
images. These tests have been updated to download the gzipped image.
Change-Id: I6b2dbe9472a604148834820db8ea70e91e94376f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35257
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This brings the ISA parser in line with the style guide. Note that the
docstring needs to be a single string literal for python to consider it
a docstring, and the parser itself needs each line of the docstring to
be a rule in its CFG. We can accomplish both by taking advantage of the
fact that two directly adjacent quoted strings are treated as a single
string literal by python, and by escaping the newline so that they're
actually considered adjacent.
Change-Id: I7f4d252998877808425aafb0159600ba4c3bf9ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35276
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Occasionally gem5's stdout/stderr, when run within the TestLib
framework, will be shuffled. This is resolved by flushing the
stdout/stderr buffer before and after simulation.
In addition to this, the verifier.py has been improved to remove
boilerplate gem5 code from the stdout comparison.
Change-Id: I04c8f9cee4475b8eab2f1ba9bb76bfa3cfcca6ec
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34995
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Commits 02745afd and f9b4e32 introduced a mechanism for creating checkpoint
objects for hardware transactional memory (HTM) and Arm TME. Because the
checkpoint object also contains the local UID of a transaction, it is
needed before any architectural checkpointing takes places. This caused
segfaults when running HTM codes.
This commit allows ISAs to allocate a checkpoint once at the beginning
of simulation. In order to do that we need to remove the validity check
assertion; the cpt will become valid only after a first successfull
transaction start
Change-Id: I233d01805f8ab655131ed8cd6404950a2bf6fbc7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35015
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The log_call helper is now accepting a time parameter (dictionary). If
the param is not None, the function will fill the timing indications
(user and system time) for the TestCase.
There are some TestCases whose user time is not of our interest; for
example we don't really care about the cpu time of a stdout diff
(MatchStdout tests). In those cases the resulting cpu time in the
generated JUnit file (results.xml) will be 0.
JIRA: https://gem5.atlassian.net/browse/GEM5-548
Change-Id: I53c1b59f8ad93900aeac06197e39189c00a9053c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32653
Tested-by: kokoro <noreply+kokoro@google.com>
This change replaces the __attribute__ syntax with the now standard [[]]
syntax. It also reorganizes compiler.hh so that all special macros have
some explanatory text saying what they do, and each attribute which has a
standard version can use that if available and what version of c++ it's
standard in is put in a comment.
Also, the requirements as far as where you put [[]] style attributes are
a little more strict than the old school __attribute__ style. The use of
the attribute macros was updated to fit these new, more strict
requirements.
Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Before this commit, ExecSymbol would show only the symbol and no address:
0: system.cpu: A0 T0 : @_kernel_flags_le_lo32+6 : mrs x0, currentel
After this commit, it shows the symbol in addition to the address:
0: system.cpu: A0 T0 : 0x10 @_kernel_flags_le_lo32+6 : mrs x0, currentel
Change-Id: I665802f50ce9aeac6bb9e174b5dd06196e757c60
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35077
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This macro probably would have been defined to "return" in some cases,
to be put after a call to a function that doesn't return so that the
compiler wouldn't think control would reach the end of a non-void
function. It was only ever defined to expand to nothing, and now that
[[noreturn]] is a standard attribute, it should never be needed going
forward.
Change-Id: I37625eab72deeaede77f9347116b9fddd75febf7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35217
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Some code was added fairly recently which would load a memory image
into a memory directly in order to make it easier to set up ROMs.
Unfortunately, that code accidentally used the image size instead of
the cache line size when setting up the port proxy which would actually
write the data. This happens to work when the image size is a power of
two since that's all the proxy checks for, but there's no guarantee
that every image will be sized that way.
This change instead looks into the system object, retrieves the cache
line size from it, and uses that to set up the port proxy.
Change-Id: I227ac475b855d9516e1feb881769e12ec4e7d598
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35155
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Sometimes ELF files have segments in them which are marked as loadable,
but which actually have zero size in memory. When setting up a memory
image we should drop those to avoid confusing other code which tries
to find the footprint of a memory image. No part of these segments,
including their starting address or ending address, need to actually
land on top of memory since they don't actually contain any data.
Change-Id: If8b61d10db139e0f688b6ceabcb8e6a898557469
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35156
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Added utility class `TimedWaitPID` which monkey-patches os.waitpid()
with a functor that has the same signature, but calls os.wait4()
instead. This allows the process's user and system CPU time to be
obtained from the OS when using APIs (such as subprocess) which use
os.waitpid() internally.
The process CPU time is stored within the functor and can be read back
later by calling TimedWaitPID.get_time_for_pid().
JIRA: https://gem5.atlassian.net/browse/GEM5-548
Change-Id: I9ebe9ca1241a4f28c90ad31f672f32ac52786664
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32652
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
This change sets gpuDynInst->exec_mask for permute and bpermute
instructions, fixing a bug where they would never write their data.
permute and bpermute instructions are load instructions that write
to a VGPR. Because of that, they use gpuDynInst->exec_mask when
checking what lanes should write to the VGPR.
gpuDynInst->exec_mask gets set to wf->execMask() as that is what other
load instructions that write to VGPRs do.
Change-Id: Ie443283488cbd2ab9c17fc255e7cc44418353419
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35036
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
(1) ThreadContexts are registered into System in BaseCPU::init.
(2) FVPBasePwrCtrl state is resized based on registered ThreadContexts
in FVPBasePwrCtrl::init.
FVPBasePwrCtrl::init may be called before BaseCPU::init based on the
model names alphabetical order, leading to segmentation faults.
To fix this, (2) is now carried out in FVPBasePwrCtrl::startup.
Change-Id: Ica6c5b7448da556d61aee53f8777a709fcad2212
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35075
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>