For scratch instructions only, this bit specifies if an offset in a VGPR
should be used for address calculation. This is new in MI300 and was
previously the LDS bit. The LDS bit is rarely used and in fact gem5 does
not even check this bit.
This fixes a bug when SADDR == 0x7f (i.e., no SGPR should be used) where
a VGPR was being added to the address when it should have been ignored.
Change-Id: I9864379692df6795b25b58b98825da05d18fc5db
A load instruction can be replayed when
1) it's strictly ordered or
2) it falls into load-store forwarding mismatch.
Case 1 was considered in executeLoad function but the case 2 wasn't. It
causes the case-2 replayed load instruction to violate the assertion
condition "assert(!load_inst->isExecuted())" in LSQUnit::read. This
commit fixes the problem by adding consideration of the case 2 in
LSQUnit::executeLoad.
Co-authored-by: Minje Jun <minje.jun@samsung.com>
The GPUFS scripts include support for dumping and resetting
stats at kernel boundaries by identifying specific GPU kernel
exit events. This commit extends that support to work with
GPU SE-mode support.
Change-Id: I662233ae71e2987d90af1fd0100e29036b2ef1c6
The IsInvalid flag indicates that the static instruction is not part of
the executing ISA and not part of m5's pseudo-instructions. This flag
provides a way to recognize an illegal instruction at the decode stage.
GPU_VIPER.py was modified to use these options but they did not exist,
breaking GPUFS. This commit adds them to fix the issue.
Change-Id: I0095f400ea606c4e8d91a41870ef208465cef803
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.
This PR incorporates numerous improvements and fixes to the gem5
PyStats. This includes:
* PyStats now support SimObject Vectors. The PyStats representing them
are subscribable and therefore acceptable by accessing an index: e.g.,:
`simobjectvec[0]`. (This replaces the `Vector` group PyStat)
* Adds the `SparseHist` PyStats.
* Adds the `Vector2d` to PyStats.
* The `Distribution` PyStats is fixed to be a vector of Scalars.
* Tests added for the PyStat's Vector and bugs fixed.
'v4.0.0' wasn't working. The following error was occurred:
```
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' for action 'actions/upload-artifact/merge@v4.0.0'.
```
Change-Id: I658b0fe292df029501fbc1286acb06f4014ae4e1
This commit adds instSeqNum to the atomic responses in
GPU_VIPER-TCC.sm. This will be useful when debugging issues related to
GPU atomic transactions
Change-Id: Ic05c8e1a1cb230abfca2759b51e5603304aadaa3
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.
Change-Id: I8200ce6e77da7c8b4db285c0cc8b8ca0dfa7d720
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem
replacement
policies for TCC, TCP and SQC caches for GPU
The IsInvalid flag indicates that the static instruction is not part
of the executing ISA and not part of m5's pseudo-instructions. This
flag provides a way to recognize an illegal instruction at the decode
stage.
Change-Id: I2779c6edcd8c5e6a77ea11cad3ff73bacb79d800
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Adding RP_choose function to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Change-Id: I86cc41cca19f8e0d24d8cf015e2e034a1fc4bc43
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose function to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
When accessing memory using functionalAccess(), the MMU could tell us to
use a nonsecure access even though the CPU is operating in secure mode.
I noticed this while trying to run a simple semihosting hello world with
the MMU+caches enabled and the semihosting calls ended up reading from
memory instead of the caches due to an S/NS mismatch.
See also https://github.com/gem5/gem5/pull/1198 which happens to also
mask the issue I saw, but I believe both changes are needed.
Change-Id: I9e6b9839b194fbd41938e2225449c74701ea7fee
Some arithmetic instructions of the riscv vector extension where still
using the default VLEN=256 instead of the dynamic one through the
inherited `vlen` attribute. Most of them only use this to calculate the
effective index for the mask element like so:
```
uint32_t ei = i + vtype_VLMAX(vtype, vlen, true) * this->microIdx;
if (this->vm || elem_mask(v0, ei)) {
...
```
This means that instructions will wrongly compute the mask index in the
second and subsequent micro instructions (`microIdx` > 0). This commit
fixes this by adding the corresponding `set_vlen` snippet to the
affected instruction formats.
Change-Id: Ib041de972d6938490741a9fb4c214a6a5172c34e
While running a simple Arm32 binary, I noticed that all memory
transactions were being marked as NS instead of S once I turn on the MMU
(even though the page tables have the NS bit set to zero). The result of
this was that semihosting calls were failing since they were using
functional accesses with the SECURE flag set, but the caches only
contained NS tagged entries so these accesses always read stale values
from DRAM.
Digging through the Arm MMU code it appears that the NS bit lookup was
being keyed of the `secureLookup` flag which is only used for long
descriptors. I believe 0c28712f51 should
have used isSecure instead of secureLookup. To avoid using these
uninitialized values in the future I wrapped the LPAE state in a
std::optional to ensure that it is only accessed once initialized.
Change-Id: Ibc406ed3f4cfa768f470e34a5eca3c1a2bf45cd8
The SYS_WRITEC and SYS_WRITE0 calls are specified as writing to the
debug channel, so it is a reasonable expectation for these messages to
be visibile immediately after the semihosting call.
Change-Id: I8e6e9a7aab593a59e82ecb9cf4603c18c7a8acbe
Clang warns as follows: `warning: definition of implicit copy
constructor for 'TranslResult' is deprecated because it has a
user-declared copy assignment operator`
Change-Id: Ic701d8522aac75d569f4f513f54de91f76a17e48
`src/dev/virtio/VirtIORng 2.py` is identical to
`src/dev/virtio/VirtIORng.py`, and the former does not appear in any
build script.
Change-Id: I9c5f1b1a3809d1c7028b630c32310e540613e232
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Fixed the assertion statement in the cpu's translation.hh file so that
it doesn't fail the assertion if the cache is clean.
I compile this c code to `test`
```c
#include <stdio.h>
static inline void clflush(volatile void *p) {
__asm__ volatile ("clflush (%0)" : : "r"(p) : "memory");
}
int main() {
int data = 42; // Example variable
printf("Value before clflush: %d\n", data);
clflush(&data);
printf("Value after clflush: %d\n", data);
return 0;
}
```
And run it with this script
`./build/X86/gem5.opt configs/learning_gem5/part1/two_level.py ./test`
In order to verify that it no longer fails the assertion check.
GitHub Issue: #862
Change-Id: I6004662e7c99f637ba0ddb07d205d1657708e99f
This feature has been available since Vega10 but was never implemented.
MI300 adds a few new instructions that make use of this more often
(e.g., v_mov_b64).
Change-Id: Ieeb7834462b76d77c0030f49622d0de09f90c9e4
This instruction is a simple move from accumulation register to
accumulation register. It is essentially a move with the accumulation
offset added to the register index.
Change-Id: Ic93ae72599b75c91213f56ebafe5bbd7b2867089
Flat, scratch, and global share the same instruction implementation with
different address calculations essentially. These instructions were
already implemented but not added to the decoder. This commit adds the
remaining scratch instructions which have a shared instruction
implementation.
Change-Id: I8f2e9ceb221294dce1b81c45745b642f0592d985
The lines `constexpr int B_I = std::ceil(64.0f / (N * M / H));` caused
the following compilation error in clang Version 16:
```
error: constexpr variable 'B_I' must be initialized by a constant
expression
```
`std::ceil` is not a const expression. Therefore instances of this
expression in instructions.hh have been replaced with a constant
expression friendly alternative.
This is calling our compiler tests to fail:
https://github.com/gem5/gem5/actions/runs/9288296434/job/25559409142
Change-Id: I74da1dab08b335c59bdddef6581746a94107f370
Currently when data is downgraded by MOESI_AMD_Base-CorePair (e.g. due
to a replacement) this requires a 4-way handshake between the CorePair
and the dir. Specifically, the CorePair send a message telling the dir
it'd like to downgrade then, the dir sends an ACK back and then, the
CorePair writes the data back, and finally, the dir ACKs the writeback.
This is very inefficient and not representative of how modern protocols
downgrade a request. Accordingly, this commits updates the downgrade
support such that the CorePair writes back the data immediately and then
the dir ACKs it.
Thus, this approach requires only a 2-way handshake.
Change-Id: I7ebc85bb03e8ce46a8847e3240fc170120e9fcd6
Co-authored-by: Neeraj Surawar <neerajs@hyrule.cs.wisc.edu>