This PR implements a few changes related to the accumulation offset
which is new in MI200. Previously MI100 contained two vector register
files: the architectural and accumulation register files. These have now
been unified and the architectural register file is twice the size. As a
result of this the dispatch packet set an offset into the unified vector
register file for where the former accumulation registers would go. The
changes are:
- Calculate the accumulation offset from dispatch packet and store in
HSA task.
- Update the accumulation move instructions (v_accvgpr_read/write) to
use it.
- Update the current MFMA instructions to use it.
- Make the MFMA examples more clean.
Initialize x86 process' max stack size to the value given in the process
params, rather than hard-coding it to 8 MB, which made it impossible to
run x86 programs requiring more than 8 MB of stack.
Change-Id: I0b17fe60b016b1e4a82d704ef7ad367974ea6a08
For some simulations with big values for VLEN (e.g. 8k and 16k) there
were more packets created on the fly and, as a consequence, failing the
simulations. The sanity check has been increased in order to solve this
high VLEN cases.
Supervised by [@aarmejach](https://github.com/aarmejach)
Change-Id: I137b0f3113687b3fc9c4154d19ca5e8017e6e992
Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
Adds categorization of bypassed atomics in TCC to the TBE as either
return or no-return, which gets consumed in pa_performAtomic to
determine if atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously missing.
Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
Adds categorization of bypassed atomics in TCC to the TBE as either return
or no-return, which gets consumed in pa_performAtomic to determine if
atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
The original version of `list_changes.py` assumed no more than one
`Change-Id` tag per commit. However, since transitioning to GitHub, the
repository now contains some merge commits containing multiple
`Change-Id`s.
This patch updates `list_changes.py` to support commits with any number
of `Change-Id` tags.
Fix QoS Memory Queue Policies
* Fix assertions in LRG policy to correctly assert requestor and list
validity
* Fix `selectPacket()` in LIFO Queue Policy to correctly return the end
of the `deque` backing store for its packet queue
This commit update the two exiting MFMA instructions to support the
accumulation offset for A, B, and C/D matrix. Additionally uses array
indexed C/D matrix registers to reduce duplicate code. Future MFMA
instructions have up to 16 registers for C/D and this reduces the amount
of code being written.
Change-Id: Ibdc3b6255234a3bab99f115c79e8a0248c800400
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously
missing.
Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
The accum offset is used as an index into the unified VGPR register file
in MI200 and is not the same as a move if accum_offset in the dispatch
packet is non-zero.
Change these instructions to use the stored accum_offset value.
Change-Id: Ib661804f8f5b8392e4c586082c423645f539e641
The accumulation offset is needed for some instructions. In order to
access this value we need to place it somewhere instruction definitions
can access. The most logical place is in the wavefront.
This commit simply copies the value from the HSA task to the wavefront
object.
Change-Id: I44ef62ef32d2421953f096c431dd758e882245b4
Fix#874, in which running se.py with 4GB or more memory (via option
--mem-size=4GB) causes all KVM programs to crash or hang. This occurred
because the m5ops address range (set to 0xFFFF0000-0x100000000)
overlapped with physical memory under such a configuration.
This patch fixes the bug by moving the m5ops address range if phyiscal
memory is >=4GB.
Change-Id: Ic8a004517bc2be2c27860ed314460be749a11dc1
Update the PLIC based on the
[riscv-plic-spec](https://github.com/riscv/riscv-plic-spec) in the PR:
- Support customized PLIC hardID and privilege mode configuration
- Backward compatable with the n_contexts parameter, will generate the
config like {0,M}, {0,S}, {1,M} ...
Change-Id: Ibff736827edb7c97921e01fa27f503574a27a562
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream passes Dirty to the requestor whenever possible
even though it doesn't deallocate the line. This will make the requestor
to SD and the downstream to UD_RSD.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example error condition is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3
doesn't back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock. L3 becomes RUSC and LLC becomes
UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data
The empty constructor prevent zero-initialization working correctly. In
this change we fix the issue by removing the unwanted empty constructor.
We also change the default destructor specification with c++11 style.
Change-Id: I869a93ca5283f811c2aa58406f1478459e0d7022
* Fix selectPacket() in LIFO Queue Policy to correctly return the end of
the `deque` backing store for its packet queue
* Move selectPacket() implementations for FIFO and LIFO queues into
`q_policy.cc` file
Change-Id: I8c35e5fc83dc380b19f52be14c18b1f414f9e141
According to the RISC-V spec [1]. Any float-point instructions
accumulate FFLAGS register rather than write it to reflect the CSR
behavior.
In the previous implementation. We read the FFLAGS, set the exception
flags, and write the result back to the FFLAGS. This works in the gem5
simple and minor CPU model as they are actually written to `regFile`
after executing the instructions. However, in the gem5 O3 CPU model, it
will record in the `destMiscReg` buffer until the commit stage when
writing to the `miscReg` in the execution stage. The next instruction
will get the old FFLAGS and cause the incorrect result.
The CL introduced the `MISCREG_FFLAGS_EXE` and used the same size of
`miscRegFile` because the `MISCREG_FFLAGS_EXE` and `MISCREG_FFLAGS`
shared the same space. When executing the float-pointing instruction,
any exception flags should be updated via `MISCREG_FFLAGS_EXE` to
accumulate the FFLAGS in `setMiscReg` method. For the MISCREG_FFLAGS, it
should only be called in the CSROp.
[1] Syntactic Dependencies: Appendix A
c80ecada1c/src/mm-eplan.adoc (syntactic-dependencies-rules-9-11)
gem5 issue: https://github.com/gem5/gem5/issues/755
Change-Id: Ib7f13d95b8a921c37766a54a217a5a4b1ef17c6f
Fix#876. The x87 floating-point control word (FCW) was not initialized
at process startup in syscall emulation mode. This resulted in floating
point exceptions in KVM mode when executing x87 floating-point
instructions.
This patch fixes the bug by initializing FCW to its reset value, 0x37F.
Change-Id: Idd1573c6951524ef59466cc5c9f1e640ea7658ae
ArmSigInterruptPin don't send the interrupt to GIC. Instead it sends the
interrupt to the irq specified in Param. When using ArmSigInterruptPin,
we shouldn't ask users to provide "Platform" since it doesn't need it.
To reduce the confusion, this change removes the dependency of Platform
for ArmSigInterruptPin.
Change-Id: I0ee507ed1c08b4fa6d3e384e28732f3acb4f6892
This PR is fixing https://github.com/gem5/gem5/issues/668. It fixes it
for all ISAs other than Arm with the first commit, which is setting the
number of architectural Matrix registers to 0 for those ISA which are
not using them.
It then partly fixes it for Arm as well with the 2nd commit: by removing
RenameMap::numFreeEntries we don't stall renaming unless a matrix
instruction is encountered... This means most binaries will run with SMT
as long as they don't use FEAT_SME instructions. Please note: this is
not simply a SMT fix, it will generally address a shortcoming in the way
we were renaming instructions.
If an Arm binary wants to use SMT with FEAT_SME, the 4th commit will
make sure the lack of physical registers is notified explicitly at the
beginning of simulation, rather than silently blocking renaming
When processing memory Packets for prefetch, the `PrefetchInfo` class
constructor will attempt to copy the `Packet` data. In cases where the
`Packet` under consideration does not contain data, an assertion will be
triggered in the Packet's `getConstPtr` method, causing the simulation
to crash.
This problem was first exposed by Bug #580 when processing an
`UpgradeReq` memory packet.
This patch addresses the problem by suppressing the copying of the
`Packet` data during construction of a `PrefetchInfo` object in cases
where the `Packet` has no data.
This patch addresses Bug #580 [1], which was exposed by PR #564 [2],
subsequently reverted by PR #581 [3]
[1] https://github.com/gem5/gem5/issues/580
[2] https://github.com/gem5/gem5/pull/564
[3] https://github.com/gem5/gem5/pull/581
Change-Id: Ic1e828c0887f4003441b61647440c8e912bf0fbc
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Those are supposed to control trapping for accesses to debug registers
Change-Id: I4a25a379e718ea6d5ea8ae22ac7edbeb452d1836
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
`assert(interruptID >=0)` is always true as `interruptID` is an unsigned
int.
This was causing compilation tests failures in GCC-8 with the following
error:
```sh
src/arch/riscv/interrupts.cc:47:32: error: comparison is always true due to limited range of data type [-Werror=type-limits]
assert(interruptID >= 0);
```
Change-Id: I356be78d7f75ea5d20d34768fb8ece0f746be2fc
This PR adds support for SQC (GPU I-cache) invalidation to the GPU
model. It does this by updating the GPU-VIPER-SQC protocol to support
flushes, the sequencer model to send out invalidates and the gpu compute
model to send invalidates and handle responses. It also adds support for
S_ICACHE_INV, a VEGA ISA instruction that invalidates the entire GPU
I-cache. Additionally, the PR modifies the kernel start behavior to
invalidate the I-cache too. It previously invalidated only the L1
D-cache.
Previously, the S_ICACHE_INV instruction was unimplemented and
simulation panicked if it was encountered. This commit adds support for
executing the instruction by injecting a memory barrier in the scalar
pipeline and invalidating the ICACHE (or SQC)
Change-Id: I0fbd4e53f630a267971a23cea6f17d4fef403d15
In ComputeUnit, a previous commit added a SystemHubEvent event class to
the SQCPort. This was found to be unnecessary during the review process
and is removed in this commit. Similarly, invBuf() which was added in
FetchUnit as part of an earlier commit was found to be redundant. This
commit removes it
Change-Id: I6ee8d344d29e7bfade49fb9549654b71e3c4b96f
Previously, the data caches were invalidated at the start of each
kernel. This commit adds support for invalidating instruction cache at
kernel launch time
Change-Id: I32e50f63fa1442c2514d4dd8f9d7689759f503d3
This commit adds support for injecting a scalar memory barrier in the
GPU. The barrier will primarily be used to invalidate the entire SQC
cache. The commit also invalidates all buffers and decrements related
counters upon completion of the invalidation request
Change-Id: Ib8e270bbeb8229a4470d606c96876ba5c87335bf
This commit adds support for cache invalidation in GPU VIPER protocol's
SQC cache. To support this, the commit also adds L1 cache invalidation
framework in the Sequencer such that the Sequencer sends out an
invalidation request for each line in the cache and declares completion
once all lines are evicted.
Change-Id: I2f52eacabb2412b16f467f994e985c378230f841
This PR removes a circular dependency between `QoSMemSinkCtrl` and
`QoSMemSinkInterface` that prevented the `controller()` function of
`QoSMemSinkInterface` from being used by removing the default value for
`QoSMemSinkCtrl.interface`.
Change-Id: I4ecc39b974e239be1a2e9285e1f6f8ea873c018d
The vlm.v and vsm.v unit-stride mask load/store instructions are
constructed with an incorrect VL when the current one is larger than
than VLEN/EEW (i.e. when LMUL > 1). This commit fixes the issue for both
instructions.
We were having some difficulty on a server running this
`apt-apt-repository` command due to suspected firewall issues. On
further inspection is appear to be superfluous as git can be obtained
easily through `apt-get` without adding this repository.
This patch provides unit-stride fault-only-first loads (i.e. vle*ff) for
the RISC-V architecture.
They are implemented within the regular unit-stride load (i.e. vle*). A
snippet named `fault_code` is inserted with templating to change their
behaviour to fault-only-first.
A part from this, a new micro based on the vset\*vl\* instructions
(VlFFTrimVlMicroOp) is inserted as the last micro in the macro
constructor to trim the VL to it's corresponding length based on the
faulting index.
This trimming micro waits for the load micros to finish (via data
dependency) and has a reference to the other micros to check whether
they faulted or not. The new VL is calculated with the VL of each micro,
stopping on the first faulting one (if there's such a fault).
I've tested this with VLEN=128,256,...,16384 and all the corresponding
SEW+LMUL configurations.
Change-Id: I7b937f6bcb396725461bba4912d2667f3b22f955
This commit allows CompData_SD be sent when ReadShared hits on UD line and
the local cache keeps the line, unless the request doesn't allow SD.
Change-Id: I337f24c871cc4c19c5b5fb11f9b35c0a8eb7911c
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream respond with Unique even though it doesn't deallocate
the line. This will make the requestor to UD and the downstream to UD_RU.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example sequence is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3 doesn't
back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock.
L3 becomes RUSC and LLC becomes UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data.
Change-Id: Ic9bee27f2ec8906dd5df8bd3be60e5a9a76c782f