When decode width is larger than fetch width, the skid buffer
overflow happens at decode stage. The decode stage assumes
that fetch stage sends instructions as many as the fetch width,
but it sends them at decode width rate.
This patch makes the decode stage set its skid buffer size
according to the decode width.
Change-Id: I90ee43d16c59a4c9305c77bbfad7e4cdb2b9cffa
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67231
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Hanhwi Jang <jang.hanhwi@gmail.com>
Reviewed-by: Tom Rollet <tom.rollet@huawei.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Two W->WI transitions, on events RdBlk and Atomic in the GPU L2 cache
coherence protocol do not clear the request from the request queue upon
completing the transition. This action is not performed in the respone
path. This update adds the p_popRequestQueue action to each of these
transitions to remove the stale request from the queue.
Change-Id: Ia2679fe3dd702f4df2bc114f4607ba40c18d6ff1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67192
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
An earlier commit added support for GLC and SLC AMDGPU instruction
modifiers. These modifiers enable cache bypassing when set. The GLC/SLC
flag information was being threaded through all the way to memory and
back so that appropriate actions could be taken upon receiving a request
and corresponding response. This commit removes the threading and adds
the bypass flag information to TBE. Requests populate this
entry and responses access it to determine the correct set of actions to
execute.
Change-Id: I20ffa6682d109270adb921de078cfd47fb4e137c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67191
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Global instructions in Vega can either use a VGPR base address plus
instruction offset or SGPR base address plus VGPR offset plus
instruction offset. Currently the VGPR address/offset is always read as
two dwords. This causes problems if the VGPR number is the last VGPR
allocated to a wavefront since the second dword would be beyond the
allocation and trip an assert.
This changeset sets the operand size of the VGPR operand to one dword
when SGPR base is used and two dwords otherwise so initDynOperandInfo
does not assert. It also moves the read of the VGPR into the calcAddr
method so that the correct ConstVecOperandU## is used to prevent another
assertion failure when reading from the register file. These two changes
are made to all flat instructions, as global instructions are a
subsegement of flat instructions.
Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67077
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The current atomic memory operations are templated so any type can be
used. However floating point types can not perform bitwise operations.
The GPU model contains some instructions which do atomics on floating
point types, so they need to be supported. To allow this, template
specialization is added to atomic AND, OR, and XOR which does nothing
if the type is floating point and operates as normal for integral
types.
Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67073
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
While we include shared libraries in the Executable class, we
are not doing it when linking the SharedLib. This means the
resulting Shared library won't have the library as a dependency
(it won't appear in ldd) and the symbols will remain undefined.
Any executable will fail to link with the shared library as
the executable will contain undefined references.
This bug was exposed when I tried to link util/tlm sources with
libgem5.so. As I have libpng/libpng-dev installed in my machine,
the shared library included libpng headers, but didn't link
to the library as scons didn't append "-lpng" to the linking CL.
Those png functions thus remained ubdefined symbols.
Change-Id: Id9c4a65607a7177f71659f1ac400a67edf7080fd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66855
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
DPP processing has several issues which are fixed in this changeset:
1) Incorrect comment is updated
2) newLane calculation for shift/rotate instructions is corrected
3) A copy of original data is made so that a copy of a copy is not made
4) Reset all booleans (OOB, zeroSrc, laneDisabled) after each lane
iteration
The shift, rotate, and broadcast variants were tested by implementing
them in assembly and running on silicon.
Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66752
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The bitfield extract instructions come in unsigned and signed variants.
The documentation on this is not correct, however the GCN3 documentation
gives some clues. The instruction should extract an N-bit integer where
N is defined in a source operand starting at some bit also defined by a
source operand. For signed variants of this instruction, the N-bit
integer should be sign extended but is currently not.
This changeset does sign extension using the runtime value of N by ORing
the upper bits with ones if the most significant bit is one. This was
verified by writing these instructions in assembly and running on a real
GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64.
Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66751
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The GPU cache models do not support cache bypassing when the GLC or SLC
AMDGPU instruction modifiers are used in a load or store. This commit
adds cache bypass support by introducing new transitions in the
coherence protocol used by the GPU memory system. Now, instructions with
the GLC bit set will not cache in the L1 and instructions with SLC bit
set will not cache in L1 or L2.
Change-Id: Id29a47b0fa7e16a21a7718949db802f85e9897c3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66991
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
DispatchId should allocate two SGPRs instead of one. Allocating one was
causing all subsequent SGPR index values to be off by one, leading to
bad addresses for things like flat scratch and private segment. This
field is not used very often so it was not impacting most applications.
Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66711
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The ResetRequestPort and ResetResponsePort have a few problems:
1. A reset signal should happen during the time a reset is asserted,
or in other words the device should stay in reset and not doing
anything while reset is asserted. It should not immediately restart
execution while the reset is still held.
2. These names are misleading, since there is no response. These names
are inherited from other port types where there is an actual response.
There is a new generic SignalSourcePort and SignalSinkPort set of port
classes which are templated on the type of signal they propogate, and
which can be used in place of reset ports in c++. These ports can
still have a specialized role which will ensure that only reset ports
are connected to each other for a form of type checking, although
the underlying c++ instances are more interoperable than that.
Change-Id: Id98bef901ab61ac5b200dbbe49439bb2d2e6c57f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66675
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These are largely compatibility wrappers around the Signal*Port
classes. The python versions of these types enforce more specific
compatibility, but on the c++ side the Signal*Port<bool> classes can
be used directly instead.
Change-Id: I1325074d0ed1c8fc6dfece5ac1ee33872cc4f5e3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66673
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch fixes a problem that occurs when restoring from
a checkpoint where Mapped File Buffers are not restored. This
causes errors and unexpected behavior during further execution.
Since the checkpoint already has the size of the
area (address range) and the file name, only the offset is
missing to restore the Mapped File Buffer. Having the offset
value, it's possible to open those files for which an offset is
specified and create a VMA with a Mapped File Buffer.
Change-Id: Ib9dfa174cda6348b966b892184c36daeaba80e81
Signed-off-by: Emin Gadzhiev <e.gadzhiev.mhk@gmail.com>
Issue-On: https://gem5.atlassian.net/browse/GEM5-1302
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66311
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
When adding a long list of registers, it can be easy to miss one which
will offset all the registers after it. It can be hard to find those
sorts of problems, and tedious and error prone to fix them.
This change adds a mechanism to simply annotate what offset a register
should have. That should also make the register list more self
documenting, since you'll be able to easily see what offset a register
has from the source without having to count up everything in front of it.
Change-Id: Ia7e419ffb062a64a10106305f875cec6f9fe9a80
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66431
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
If a request is initiated by systemc, passed through TlmToGem5 bridge
and Gem5ToTlm bridge, it wouldn't have the systemc extension about the
association. This feature is also used in TlmToGem5 bridge to detect if
the packet is allocated in the current instance in async interface. In
that case, we would lose the association in the Gem5ToTlm bridge async
interface. For not making wide change, we need an extra way to support
the association in Gem5ToTlm bridge async interface.
This change adds another map to record the association and clears when
the TLM transaction is completed.
Change-Id: I486441e813236ea2cabd1bd6cbb085b08d75ec8f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66054
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This partly reverts commit ec75787aef
by fixing the original problem noted by Bobby (long regressions):
setupThreadContext has to be implemented otherswise the GICv3 cpu interface
will end up holding old references when switching TC/ISAs.
This new implementation is still setting up the cpu interface reference
in the ISA only when it is required, but it is storing the
TC/ISA reference within the interface every time the ISA::setupThreadContext
gets called.
Change-Id: I2f54f95761d63655162c253e887b872f3718c764
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66291
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>