1. Adds newly added PyStat classes to "__init__.py", ensuring they can
all be accessed via a `m5.ext.pystats` import.
2. Simplifies the layout out "__init__.py" to just import all classes
from all files.
Change-Id: I43bfc5e7ff1aec837e661905304c6fb10b00c90e
This commit addresses Jason's comment
(https://github.com/gem5/gem5/pull/996#discussion_r1613870880) which
highlighted putting the `_m5.stats.processDumpQueue` call in the
iteration through the `root` object in `get_simstat` caused this
function be potentially called many times when it only needs to be
called once. This chance moved this call to just before this iteration
and will tehrefore only be called once (if required) per `get_simstat`
execution.
Change-Id: I16908b6dee063a0df7877a19e215883963bfb081
As Distribution inherits from Vector, it should be constructed with
a Dictionary of scalars (in our implementation, a dictionary mapping the
vector position's unique id for each bin and the value of that bin).
Change-Id: Ie603c248e5db4b6dd7f71cc453eebd78793f69a3
The big thing missing from the Vector stats was that each position in
the vector could have it's own unique id (a str, float, or int) and each
position in the vector can have its own description. Therefore, to add
this the Vector is represented as a dictionary mapping the unique ID to
a Pystat Scaler (whcih can have it's own unique description.
Change-Id: I3a8634f43298f6491300cf5a4f9d25dee8101808
This isn't a true Base class, it's just a Vector. In gem5 all Vectors
are Scalar Vectors. This change simplfies the naming.
Change-Id: Ib8881d854ab18de6acbf0fb200db2de6a43621a7
This change fixes#1148
I have only added an acknowledged return, as we dont ahve remote and
wrap mode so it can only be in stream mode.
Change-Id: I1882042d873ff0e9465c9491238554c8fbb9aa76
Those were not part of the performTlbi switch and simulation was
therefore panicking when they were encountered
Change-Id: Ifbe0b89e45539df4abc147ac5970b0caf0d9dfdc
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit fixes and refactors the implementation of viota. It also
overrides the generateDisassembly function in viota's macro/micro to
correctly print out the instruction when tacing/debugging.
For example, it changes from:
viota_m vd, vd, vs2, v0.t
to:
viota_m vd, vs2, v0.t
This adds two failsafes which may cause a panic on some machines. First,
check the host machine has the KVM XCR capability before calling getXCRs
or setXCRs. Second, ensure the x87 bit, which must always be one, will
always return at least one by modifying the return value in readMiscReg.
Change-Id: I5e778acc926a47443ef6cef29fabd84eb69bb9ba
This implements some missing loads and store that are commonly used in
applications with MFMA instructions to load 16-bit data types into
specific register locations: DS_READ_U16_D16, DS_READ_U16_D16_HI,
BUFFER_LOAD_SHORT_D16, BUFFER_LOAD_SHORT_D16_HI.
Change-Id: Ie22d81ef010328f4541553a9a674764dc16a9f4d
Add a unit test for the MXFP types (bf16, fp16, fp8, bf8). These types
are not currently operated on directly. Instead the are cast to float
values and then arithmetic is performed. As a result, the unit test
simply checks that when we convert a value from MXFP type to float and
back that the values of the MXFP type match. Exact values are used to
avoid discrepancies with rounding.
Can be run using scons build/VEGA_X86/unittests.opt .
Change-Id: I596e9368eb929d239dd2d917e3abd7927b15b71e
These instructions are used in some of the F16 MFMA example applications
to convert to/from floating point types.
Change-Id: I7426ea663ce11a39fe8c60c8006d8cca11cfaf07
This instruction is new in MI300 and is used in some of the example
applications used to test MFMAs.
Change-Id: I739f8ab2be6a93ee3b6bdc4120d0117724edb0d4
This adds the decodings for all of the matrix fused multiply add (MFMA)
and sparse matrix fused multiply accumulate (SMFMAC) instructions up to
and including MI300. This does not yet provide the implementation for
these instructions, however it is easier and less tedious to add them in
bulk rather that one at a time.
Change-Id: I5acd23ca8a26bdec843bead545d1f8820ad95b41
The microscaling formats (MXFP) and INT8 types require additional size
checks which are not needed for the current MFMA template. The size
check is done using a constexpr method exclusive to the MXFP type,
therefore create a special class for MXFP types. This is preferrable to
attempting to shoehorn into the existing template as it helps with
readability. Similar, INT8 requires a size check to determine number of
elements per VGPR, but it not an MXFP type. Create a special template
for that as well.
This additionally implements all of the MFMA types which have test cases
in the amd-lab-notes repository (https://github.com/amd/amd-lab-notes/).
The implementations were tested using the applications in the
matrix-cores subfolder and achieve L2 norms equivalent or better than
MI200 hardware.
Change-Id: Ia5ae89387149928905e7bcd25302ed3d1df6af38
This class can be used to load multiple operand dwords into an array and
then select bits from the span of that array. It handles cases where the
bits span two dwords (e.g., you have four dwords for a 128-bit value and
want to select bits 35:30) and cases where multiple values < 32-bits are
packed into a single dword (e.g., two bf16 values).
This is most useful for packed arrays and instructions which have more
than two dwords. Beyond two dwords, the operator[] overload of
VectorOperand is not available requiring additional logic to select from
an operand. This helper class handles that additional logic itself.
Change-Id: I74856d0f312f7549b3b6c405ab71eb2b174c70ac
The open compute project (OCP) microscaling formats (MX) are used in the
GPU model. The specification is available at [1]. This implements a C++
version of MXFP formats with many constraints that conform to the
specification.
Actually arithmetic is not performed directly on the MXFP types. They
are rather converted to fp32 and the computation is performed. For most
of these types this is acceptable for the GPU model as there are no
instruction which directly perform arithmetic on them. For example, the
DOT/MFMA instructions operating may first convert to FP32 and then
perform arithmetic.
Change-Id: I7235722627f7f66c291792b5dbf9e3ea2f67883e
Release of MI300X simulation capability:
- Implements the required MI300X features over MI200 (currently only
architecture flat scratch).
- Make the gpu-compute model use MI200 features when MI300X / gfx942 is
configured.
- Fix up the scratch_ instructions which are seem to be preferred in
debug hipcc builds over buffer_.
- Add mi300.py config similar to mi200.py. This config can optionally
use resources instead of command line args.
It appears we have been trying to read 64-bit arguments for ARM32 since
695583709b. I noticed that SYS_OPEN was
trying to read a really long string as the pathname argument and it
turned out it was reading from the wrong stack offset. With this change
I can successfully run some of the semihosting tests for ARM32.
Change-Id: Ie154052dac4211993fb6c4c99d93990123c2eacf
In BaseSemihosting::readString() we were using the len argument to
allocate a std::vector without checking whether the value makes any
sense. This resulted in a std::bad_alloc exception being raised prior to
https://github.com/gem5/gem5/pull/1142 for my semihosting tests. This
commit prevents semihosting from reading more than 64K for string
arguments which should be more than sufficient for any valid code.
Change-Id: I059669016ee2c5721fedb914595d0494f6cfd4cd
This commit fixes the implementation of vrgather instruction based on
rvv 1.0.
In section 16.4. Vector Register Gather Instructions,
> Vector-scalar and vector-immediate forms of the register gather are
also provided. These read one element from the source vector at the
given index, and write this value to the active elements of the
destination vector register. The index value in the scalar register and
the immediate, zero-extended to XLEN bits, are treated as unsigned
integers. If XLEN > SEW, the index value is not truncated to SEW bits.
The fix zero-extends the index value in the scalar register and the
immediate.
Architected flat scratch is added in MI300 which store the scratch base
address in dedicated registers rather than in SGPRs. These registers are
used by scratch_ instructions. These are flat instruction which
explicitly target the private memory aperture. These instructions have a
different address calculation than global_ instructions.
This change implements architected flat scratch support, fixes the
address calculation of scratch_ instructions, and implements decodings
for some scratch_ instructions. Previous flat_ instructions which happen
to access the private memory aperture have no change in address
calculation. Since scratch_ instructions are identical to flat_
instruction except for address calculation, the decodings simply reuse
existing flat_ instruction definitions.
Change-Id: I1e1d15a2fbcc7a4a678157c35608f4f22b359e21
Add support for the following two extensions:
[Zvfh](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point):
Vector Extension for Half-Precision Floating-Point
[Zvfhmin](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#184-zvfhmin-vector-extension-for-minimal-half-precision-floating-point):
Vector Extension for Minimal Half-Precision Floating-Point
For instructions (`vfncvt[.rtz].x[u].f.w`) and (`vfwcvt.f.x[u].v`) which
will become defined when `SEW = 8`, a new template
`VectorFloatWideningAndNarrowingCvtDecodeBlock` is added and 8-bit
floating point type (`float8_t`) is defined.
The data type `float8_t` is introduced in the newer `3e` version of the
SoftFloat Package, however, the current version in use is `3d` which
does not include this definition. Despite this, `float8_t` is utilized
solely for constructing the `vfncvt[.rtz].x[u].f.w` and
`vfwcvt.f.x[u].v` instructions when `SEW = 8`. There are no operations
that directly manipulate data of the `float8_t` type.
This is the version for MI300. For the most part, it is the same as
MI200 with the exception of architected flat scratch (not yet
implemented in gem5) and therefore a new version enum is required.
Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a
The SIMPOINT_BEGIN should do nothing by default since it might be used
in various cases.
In
[https://www.mail-archive.com/gem5-users@gem5.org/msg22383.html](mailing
list), a user discovered a bug with the current
`simpoints-se-restore.py` example.
The bug is caused by the default behavior of the SIMPOINT_BEGIN exit
event.
When taking a checkpoint with `simpoints-se-checkpoint.py`, it stores
the future exit event scheduled at the beginning of the simulation. I
did not notice this when I wrote and tested the example script due to
the long print out log and my custom handler of the SIMPOINT_BEGIN exit
event.
In the restoring, the SIMPOINT_BEGIN exit event was triggered right
before the region end, so it resets the stats before the final stats
dump. Therefore, the simulation time is 0 as the user discovered.
This patch should fix this bug.
Change-Id: I800dfbd28d7b2c842864a1ab7d84b8f8e17b9b3c
This PR implements FEAT_MPAM on the CPU side. We define a MPAM system
registers and a mechanism
for tagging memory requests with the MPAM information bundle as
specified in existing documentation [1].
What this PR is *not* covering is the MPAM implementation in a MSC
(Memory System Component).
Which means at the moment it's only possible to have static partitioning
schemes (via the PartitioningPolicies
already part of gem5) and there is currently no way to dynamically
program partitions at runtime.
[1]: https://developer.arm.com/documentation/ddi0487/latest/
Quote from change[1]
> The RISC-V spec clarifies the CSR instruction operation, some of them
shall not read or write CSR by the hints of RD/RS1/uimm, but the
original version use the 'data != oldData' condition to determine
whether write or not, and always read CSR first.
See CSR instruction in spec:
Section 9.1 Page 56 of
https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf
|||Register operand|||
|--- |--- |--- |--- |--- |
|Instruction|rd is x0|rs1 is x0|Reads CSR|Writes CSR|
|CSRRW|Yes|-|No|Yes|
|CSRRW|No|-|Yes|Yes|
|CSRRS/CSRRC|-|Yes|Yes|No|
|CSRRS/CSRRC|-|No|Yes|Yes|
|||Immediate operand|||
|Instruction|rd is x0|uimm = 0|Reads CSR|Writes
CSR|
|CSRRWI|Yes|-|No|Yes|
|CSRRWI|No|-|Yes|Yes|
|CSRRSI/CSRRCI|-|Yes|Yes|No|
|CSRRSI/CSRRCI|-|No|Yes|Yes|
The issue cause the ubuntu hanging because we shared the same status CSR
with `mstatus`, `sstatus` and `ustatus` and interrupt enabling CSR with
mip, sip and uip. We may need to read origin CSR without effect of
unmask bits to avoid override the bits of other CSR. Now the ubuntu can
work after the patch merged.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/67717
The destination select should take a value of the selection size (dword,
word, or byte) starting at bit 0, move that to the selected destination,
and then apply the unused constraint (DST_U) to the remaining word or
bytes. Currently the code is selecting the word/byte currently being
iterated over, rather than the least significant word/byte. As a result,
any selection that is not word 0 or byte 0 will be replaced with the
original destination value at those bits. This results in the wrong
value.
This commit changes the orig bits to be the original dest value at the
lowest word / byte location. Tested with the mfma_i32_16x16x16i8 example
which uses an SDWA V_OR_B32 to pack i8 values into VGPRs for the MFMA.
Change-Id: I54ed819479a25fa9276d29a8f14f0fea7fd71afe
This PR includes a check for `m_pkt` being null and appropriately
handles that case. This issue was causing the Daily tests to fail.
Change-Id: I87142ca14ca4ab3d8306153a1cf34c2629a119ba
This patch is adding the NS bit to CHI requests to make sure they are
properly tagged according to their security
Change-Id: I33d3610edefbb5a05a6090e9125c35d4fb8bca58
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The extended control registers were not being updated in the KVM thread
context nor updated in the KVM state. This was causing issues when
checkpointing since the XCR0 value was reverting to the default value
rather than what it was previously before the checkpoint. THis was
causing multiple applications to crash due to executing instructions
which are now illegal instructions due to XCR0 being incorrect.
This commit adds the XCR0 as a misc register similar to the exiting x86
control registers and adds all of the helper functions to access and set
the register value. It also adds support for updating the KVM CPU's
state with the register value and updating the thread context's misc reg
value so that it is checkpointed along with the other misc regs.
Note that this does *not* add support for XSAVE of the AVX state (i.e.,
the upper 128 bits of YMM registers). It does however fix the immediate
problem in issue #958 .
Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
Includes fixes for several bugs reported via email, self found, and
internal reports. Also includes runs through Valgrind and UBsan. See
individual commits for more details.
The scalar cache is not being invalidated which causes stale data to be
left in the scalar cache between GPU kernels. This commit sends
invalidates to the scalar cache when the SQC is invalidated. This is a
sufficient baseline for simulation.
Since the number of invalidates might be larger than the mandatory queue
can hold and no flash invalidate mechanism exists in the VIPER protocol,
the command line option for the mandatory queue size is removed, which
is the same behavior as the SQC.
Change-Id: I1723f224711b04caa4c88beccfa8fb73ccf56572