`assert(interruptID >=0)` is always true as `interruptID` is an unsigned
int.
This was causing compilation tests failures in GCC-8 with the following
error:
```sh
src/arch/riscv/interrupts.cc:47:32: error: comparison is always true due to limited range of data type [-Werror=type-limits]
assert(interruptID >= 0);
```
Change-Id: I356be78d7f75ea5d20d34768fb8ece0f746be2fc
This PR adds support for SQC (GPU I-cache) invalidation to the GPU
model. It does this by updating the GPU-VIPER-SQC protocol to support
flushes, the sequencer model to send out invalidates and the gpu compute
model to send invalidates and handle responses. It also adds support for
S_ICACHE_INV, a VEGA ISA instruction that invalidates the entire GPU
I-cache. Additionally, the PR modifies the kernel start behavior to
invalidate the I-cache too. It previously invalidated only the L1
D-cache.
Previously, the S_ICACHE_INV instruction was unimplemented and
simulation panicked if it was encountered. This commit adds support for
executing the instruction by injecting a memory barrier in the scalar
pipeline and invalidating the ICACHE (or SQC)
Change-Id: I0fbd4e53f630a267971a23cea6f17d4fef403d15
In ComputeUnit, a previous commit added a SystemHubEvent event class to
the SQCPort. This was found to be unnecessary during the review process
and is removed in this commit. Similarly, invBuf() which was added in
FetchUnit as part of an earlier commit was found to be redundant. This
commit removes it
Change-Id: I6ee8d344d29e7bfade49fb9549654b71e3c4b96f
Previously, the data caches were invalidated at the start of each
kernel. This commit adds support for invalidating instruction cache at
kernel launch time
Change-Id: I32e50f63fa1442c2514d4dd8f9d7689759f503d3
This commit adds support for injecting a scalar memory barrier in the
GPU. The barrier will primarily be used to invalidate the entire SQC
cache. The commit also invalidates all buffers and decrements related
counters upon completion of the invalidation request
Change-Id: Ib8e270bbeb8229a4470d606c96876ba5c87335bf
This commit adds support for cache invalidation in GPU VIPER protocol's
SQC cache. To support this, the commit also adds L1 cache invalidation
framework in the Sequencer such that the Sequencer sends out an
invalidation request for each line in the cache and declares completion
once all lines are evicted.
Change-Id: I2f52eacabb2412b16f467f994e985c378230f841
This PR removes a circular dependency between `QoSMemSinkCtrl` and
`QoSMemSinkInterface` that prevented the `controller()` function of
`QoSMemSinkInterface` from being used by removing the default value for
`QoSMemSinkCtrl.interface`.
Change-Id: I4ecc39b974e239be1a2e9285e1f6f8ea873c018d
The vlm.v and vsm.v unit-stride mask load/store instructions are
constructed with an incorrect VL when the current one is larger than
than VLEN/EEW (i.e. when LMUL > 1). This commit fixes the issue for both
instructions.
We were having some difficulty on a server running this
`apt-apt-repository` command due to suspected firewall issues. On
further inspection is appear to be superfluous as git can be obtained
easily through `apt-get` without adding this repository.
This patch provides unit-stride fault-only-first loads (i.e. vle*ff) for
the RISC-V architecture.
They are implemented within the regular unit-stride load (i.e. vle*). A
snippet named `fault_code` is inserted with templating to change their
behaviour to fault-only-first.
A part from this, a new micro based on the vset\*vl\* instructions
(VlFFTrimVlMicroOp) is inserted as the last micro in the macro
constructor to trim the VL to it's corresponding length based on the
faulting index.
This trimming micro waits for the load micros to finish (via data
dependency) and has a reference to the other micros to check whether
they faulted or not. The new VL is calculated with the VL of each micro,
stopping on the first faulting one (if there's such a fault).
I've tested this with VLEN=128,256,...,16384 and all the corresponding
SEW+LMUL configurations.
Change-Id: I7b937f6bcb396725461bba4912d2667f3b22f955
This commit allows CompData_SD be sent when ReadShared hits on UD line and
the local cache keeps the line, unless the request doesn't allow SD.
Change-Id: I337f24c871cc4c19c5b5fb11f9b35c0a8eb7911c
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream respond with Unique even though it doesn't deallocate
the line. This will make the requestor to UD and the downstream to UD_RU.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example sequence is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3 doesn't
back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock.
L3 becomes RUSC and LLC becomes UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data.
Change-Id: Ic9bee27f2ec8906dd5df8bd3be60e5a9a76c782f
In Ruby CHI protocol UD_RU state means the line is in UD state in
the local cache and the upstream may have it in UD or UC state.
In the previous implementation UD_RU line was just dropped without
WriteBack which can cause loss of dirty data when the upstream has it
in UC state.
This commit fixes it by performing WriteBack when evciting UD_RU line.
Change-Id: I1db9b4f95cc576e71dcef38b01de24775df514ba
Vector unit-stride instructions have an EEW encoded directly in the instruction,
We should use that instead of SEW in vtype.
Change-Id: I282041ce8ed57fbcca899f7497ef6c6fb2dfcf85
Besides the standard RISC-V interrupts software, timer, and external
interrupt, the RISC-V specification also offers the possibility to
implement local interrupts. With this patch, we contribute an extension
of RiscvInterrupts that enables connecting interrupt sources to the
local interrupt controller. We assigned the local interrupts to
machine-level and gave them the highest priority. If two local
interrupts are pending, there exception code will be the tie-breaker
(higher ID > lower ID). 32 Bit systems only recognize the local
interrupts 16 to 31, 64 Bit systems 16 to 63.
Change-Id: Iff8d34e740b925dce351c0c6f54f4bd37a647e0c
---------
Co-authored-by: Robert Hauser <robert.hauser@uni-rostock.de>
This allows us to manually trigger daily test runs rather than wait for
the scheduled time. This can be useful in cases where a fix for a broken
test is pushed and we wish to verify it works as intended ASAP.
This change allows pyunit tests to be run on specific directories
instead of the default `pyunit` directory.
You can pass in the directory as follows. I have built gem5.opt for
RISCV however it should work the same with other builds
```
./build/RISCV/gem5.opt tests/run_pyunit.py --directory tests/pyunit/gem5/
```
The default path works as it is currently
```
./build/RISCV/gem5.opt tests/run_pyunit.py
```
Change-Id: Id9cc17498fa01b489de0bc96a9c80fc6b639a43f
Signed-off-by: Suraj Shirvankar <surajshirvankar@gmail.com>
The RISC-V privilege spec don't specify the implementation of
PMA(physical memory attribute), which is addressed in the previous
CL[1].
This CL creates the BasePMAChecker to support customized PMA so that we
can only focus on the features wanted in the study. The CL also leaves
the common methods `check` and `takeOverFrom` to make MMU easy to
interact with PMA.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/40596
Change-Id: I9725e3a8f7f9276e41f0d06988259456149d2a77
Crypto instructions will cause an undefined instruction when executed
with SIMD disabled. The PR is also
refactoring their implementation by checking the release object instead
of the ID register field. This is improving
readability
The backdoor request in b_transport is only used for hinting the dmi
capability. Since most of traffic patterns are continous, we can cache
the previous backdoor request result to spare the backdoor inspect of
next request.
Change-Id: I53c47226f949dd0be19d52cad0650fcfd62eebbc
This commit adjusts the logic in VectorFloatMaskMacroConstructor to
ensure the %(copy_old_vd)s section is not skipped when vl = 0, ensuring
correct values in destination vector register.
Change-Id: I2478722d6f003a0f2e4b3cd0ba3e845bed938ee6
This is the same problem as #715 .
We not only check for the presence of the relative FEAT_*,
we also check if AdvSIMD is enabled; we throw an undefined
instruction otherwise.
Change-Id: I1fd0cdc8057c5a7901774802dc076817f06c8e66
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Check directly if extension is enabled instead of looking
for ID register field value. This makes the code more readable
Change-Id: If0b882ac3464c3587731b72a7edb3b8b65ea86c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Having the number of physical registers matching exactly the number of
architectural ones does not guarantee a proper execution as it means the
freeList would have 0 registers available for renaming. In this case the
worst would happen: renaming would silently stall execution
indefinitely. With this change we report the issue to the user and fail
execution
Change-Id: I1eb968802f1a1a5115012f44b541542a682f887d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The method is extracting the minimum number of [1] non-zero free
registers/entries across all register classes. This means that if we
have saturated all register storage for a particular class, renaming
will stop as a whole.
I believe it does make sense to keep renaming and only block renaming in
case an instruction requiring the particular register type is
encountered. This would happen with the Rename::renameInsts method
[1]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename_map.hh#L269
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L662
Change-Id: I932826a77a5c0b2e05d8fdcab0e6ca13cf0e3d23
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is working around an existing SMT issue [1].
The BaseO3CPU uses two physical matrix registers [2]. This is
enough for a single threaded CPU which as of now uses
1 architectural matrix only.
The problem arises when SMT is enabled. As 2 architectural matrices
need to be supported by a single CPU, the O3CPU won't have any available
register in the freeList for renaming. This causes the SMT O3CPU to
indefinitely stall renaming [3]
If the archtectural number of registers is seto to 0, the regclass won't
be taken into consideration when evaluating if we can rename
instructions.
This issue has been implicitly fixed for RISCV by a preceding PR [4]
[1]: https://github.com/gem5/gem5/issues/668
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/BaseO3CPU.py#L170
[3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L1228
[4]: https://github.com/gem5/gem5/pull/83
Change-Id: I99bfdefff11a246b1f191251dc67689e95b3f0db
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is more complaint with the VMSAv8-64, which is using Translation
Regimes instead of
historical (Armv7) isHyp tagging and the ExceptionLevel managing the
translation. This greatly
simplifies translation code, specially with FEAT_VHE where the managing
el (EL2) could handle to different
translation regimes (EL and EL2&0).
The PCI configuration space is 256 bytes, yet because the
PCI_CONFIG_SIZE macro is 0xff, the final register allocation in the IDE
controller only allocated up to byte 255.
Change-Id: I1aef2cad9df366ee8425edb410037061eb29ae33