Add declaration of HAFGRTR_EL2 registers and read/write as GPR.
Change-Id: I87570d1e87d479f4530cf2c6e05931cdc26ee361
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit contains the rest of the base 2 vs base 10 cache/memory
size clarifications. It also changes the warning message to use
warn(). With these changes, the warning message should now no
longer show up during a fresh compilation of gem5.
Change-Id: Ia63f841bdf045b76473437f41548fab27dc19631
This commit adds a warning message for when cache or memory sizes
will be automatically converted from metric units (e.g. kB) to
binary units (e.g. KiB).
Change-Id: I4ddf199ff2f00c78bbcb147e04bd88c496fa16ae
This commit changes files in src/python/gem5 so that memory and
cache sizes use "KiB", "MiB", and "GiB" instead of "KB", etc. This
makes the codebase more consistent, as gem5 will automatically
convert memory and cache sizes that are in metric units (KB) to
binary units (KiB).
Change-Id: If5b1e908ddcff7182b71e789229a3ba1fa6ad1f1
We don't store a pointer to the indexing policy anymore.
Instead, we register a tag extractor callback when we
construct the TaggedEntry
Change-Id: I79dbc1bc5c5ce90d350e83451f513c05da9f0d61
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We don't store a pointer to the indexing policy anymore.
Instead, we register a tag extractor callback when we
construct the CacheEntry
Change-Id: I06dc58e2f67e01f3f9bcd9f0c641505d3aec82ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
As detailed by a previous commit, AssociativeSet is not needed anymore.
The class is effectively the same as AssociativeCache
Change-Id: I24bfb98fbf0826c0a2ea6ede585576286f093318
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The only difference between the TaggedEntry and the newly defined
CacheEntry is the presence of the secure flag in the first case. The
need to tag a cache entry according to the security bit required the
overloading of the matching methods in the TaggedEntry class to take
security into account (See matchTag [1]), and the persistance after
PR #745 of the AssociativeSet class which is basically identical
to its AssociativeCache superclass, only it overrides its virtual
method to match the tag according to the secure bit as well.
The introduction of the KeyType parameter in the previous commit
will smoothe the differences and help unifying the interface.
Rather than overloading and overriding to account for a different
signature, we embody the difference in the KeyType class. A
CacheEntry will match with KeyType = Addr,
whereas a TaggedEntry will use the following lookup type proposed in this
patch:
struct KeyType {
Addr address;
bool secure;
}
This patch is partly reverting the changes in #745 which were
reimplementing TaggedEntry on top of the CacheEntry. Instead
we keep them separate as the plan is to allow different
entry types with templatization rather than polymorphism.
As a final note, I believe a separate commit will have to
change the naming of our entries; the CacheEntry should
probably be renamed into TaggedEntry and the current TaggedEntry
into something that reflect the presence of the security bit
alongside the traditional address tag
[1]: https://github.com/gem5/gem5/blob/stable/\
src/mem/cache/tags/tagged_entry.hh#L81
Change-Id: Ifc104c8d0c1d64509f612d87b80d442e0764f7ca
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
As long as the AssociativeCache Entry parameter satisfies the
interface it should be fine. We enforce the bare minimum of having
a replaceable entry.
Doing otherwise will restrict our capability to have a generic cache
with generic tags
Change-Id: I23e32b7540fea6b6e5894aca3d91538e81214932
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The KeyType data type is the type of the lookup and the cache extracts
it from the Entry template parameter
Change-Id: I147d7c2503abc11becfeebe6336e7f90989ad4e8
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit is making the AssociativeCache indexing policy
a type extracted from the Entry template parameter
Change-Id: Ic9fb6ccb1b3549aaa250901e91ae3c300b92103e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Exposing the tag of a cache entry through the associative
cache APIs makes it hard to generalize the cache for
structured tags. Ultimately the tag should be a property
of the cache entry and any tag extraction logic (if needed)
should reside there. In this we can reuse the associative
cache for different Entry params, each one bearing a different
representation of a tag
Change-Id: I51b4526be64683614e01d763b1656e5be23a611b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Some compilers (gcc version 12.3.0) will start complaining when perfect
forwarding the StrideEntry argument constructed with an extra parameter
(see later patches).
Using a pointer seems to fix the gcc bug.
The commit is also changing the signature of findTable and allocateContext
so that a reference rather than a pointer is return. In this way we don't
deal with the hack of returning a raw ptr from a unique_ptr
Change-Id: Idd451208aae80bbfae76110c859e93084bcb2635
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The code is already assuming a fully associative cache. Rather than
calling getPossibleEntries with a random value and therefore needlessly
passing a vector of pointers, we use the AssociativeCache iterator to
loop over the cache entries
Change-Id: Ic99cbd39ee9f12eef9091d9d62ca24d0c3e61300
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
- Updated search query so that resources that are not compatible with
the gem5 version are still downloaded and used but a warning is thrown
instead of returning an error.
In `get_default_kernel_root_val()`, now prioiritizes the explicit
disk_device passed from the user over the default implemented by the
board.
Also adjusts syntax for selecting this value in
`set_kernel_disk_workload()` for consistency.
It seems that the common use case for setting `disk_device` is that
there is a mismatch between where the disk image is mounted and where
the board expects it by default. In this case, it also seems common that
the root partition will be on this explicit device as well.
In cases where this is not true, explicit kernel arguments can be used
to define the distinct disk device apart from the root. However, this
seems less common than the above so, in that case, it would be easier to
tie these together.
Add declaration of HAFGRTR_EL2 registers and read/write as GPR.
Change-Id: I87570d1e87d479f4530cf2c6e05931cdc26ee361
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR fixes the issues with the implementation of the Best Offset
Prefetcher described in issue #1402
On branch bop
Changes to be committed:
modified: src/mem/cache/prefetch/bop.cc
modified: src/mem/cache/prefetch/bop.hh
---------
Co-authored-by: Setu Gupta <setu.gupta.2020@gamil.com>
Co-authored-by: Abhishek Shailendra Singh <abs218@leigh.edu>
Co-authored-by: Setu Gupta <setu.gupta@partner.samsung.com>
Some syscalls were incorrectly using 64 bit
integers instead of VPtr's guest pointers,
causing parameter value corruption. This
commit addresses this issue.
Change-Id: If9e27a7c776b802dda18979d1a83a76c23557359
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Same as with the off_t, some syscalls were using
incorrect size parametres in place of a guest-defined
size_t. This commit changes the signature of said
syscalls and adds the size_t typedef to the
arch-dependent Linux OSs.
Change-Id: Iece43814971a8e6275d25f6789e41528d241d1f4
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Some system calls were using incorrect sizing for
offset parametres, which was causing the ABI to pass
wrong values due to size mismatches. One such syscall
is lseek, which in the Arm syscall table was
incorrectly marked as llseek, which does not exist
in aarch64 Linux. In addition, the off_t alias for
general Linux was changed from an unsigned to a
signed type, to accurately reflect the behaviour
in the real-life Linux operating system.
Change-Id: Iada4b66a8933466c162ba9ec901dbdae73c73a18
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit either adds the implementation or the ignoreFunc
to the corresponding entry in the syscall table for
some Arm syscalls that were required in order to test
the fix for the incorrect parameter size bug in se mode.
Change-Id: Ifc6d87e2decf1bf96ecd81de6690f92927377bf8
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Since PR #1316, we use sign-extend for all address generation, including
PC, to match the ISA specification for modifiable XLEN. However, when we
set a breakpoint using remote GDB, our address is not sign-extended.
This causes the breakpoint to be set at the wrong address, as specified
in Issue #1463. This PR fixes the issue by sign-extending the address
when setting a breakpoint. This also matches the RISC-V ISA
Specification that "must sign-extend results to fill the entire widest
supported XLEN in the destination register."
Change-Id: I9b493bf8ad5b1ef45a9728bb40fc5e38250fe9c3
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
In `get_default_kernel_root_val()`, now prioiritizes
the explicit disk_device passed from the user over the
default implemented by the board.
Also adjusts syntax for selecting this value in
`set_kernel_disk_workload()` for consistency.
Change-Id: Icddcf438f5b96c2288c3cc608782f191df2c394e
After a lot of debugging and comparing traces I noticed that vrintp was
giving different results from QEMU. An input of 0x3f800000 (1.0) was
being passed to the fplib helpers as (uint32_t)1 which has a completely
different floating-point interpretation and the result was therefore
completely wrong.
I've fixed this as well as all remaining implicit float-to-int
conversions in the ARM instruction execution. There are more
-W(implicit-)float-conversion warnings in the other executors, but for
now this fixes the issue I was seeing.
Change-Id: Ifdeee745ca155d7f4504ac4c54235ac431acdeb9
This PR fixes the issues mentioned in #1448.
**Note that this contribution is the result of a joint collaboration
with @AbhishekUoR**
This PR introduces the following 4 changes:
1. It changes the addresses which are used to compute the stride to
cache line aligned addresses (the current version uses word aligned
addresses)
2. It correctly returns if the stride does not match (as opposed to
issuing prefetches using the new stride incorrectly)
3. It returns if the new stride is 0, indicating multiple reads from the
same cache line.
4. It removes code which is no longer necessary after the addition of
changes number 1 and 3.
Change-Id: Ic346d0e15df6d07e2b93289c8d6b89b4c2f45a34
---------
Co-authored-by: Abhishek Shailendra Singh <abs218@leigh.edu>
This was found while comparing a diverging execution against QEMU traces
and checking for the first mismatched program counter. Fortunately this
was
caused by a branch shortly after this incorrect computation but still
took
a long time to track down.
There are two issues here: the decoder had inverted the cases for *S and
*A,
and the sign bit was wrong for VFN*.
This PR has 3 commits:
- Update scaling methods to scale by multiplication or division when
upcasting or downcasting respectively.
- Preserve the sign when a microscaling conversion results in NaN or
infinity to match hardware.
- Rework rounding to handle cases where conversion results in a denormal
number in the output type so that the value is correct.
Scratch memory requests that are larger than one dword are using a
different memory layout than global instructions. Rather than being
placed contiguously, each dword is interleaved 64 lanes * 4 bytes away
as described in Section 9.1.5.2. "Swizzled Buffer Addressing" in the
MI300 specification. This was verified by comparing MI300 output (which
uses scratch_ instructions) with MI200 (which uses buffer instructions).
MI300 FashionMNIST bs=1 now matches CPU reference.
This requires several changes to the instruction implementations:
- For stores, data in the GPUDynInst can be swizzled before the data is
written to memory. This is easy to do using a helper method. This is
done in the template<int N> variant of initMemWrite. To use this x2
stores are changed to use template<int N> rather than loading a U64. The
swizzle function is renamed to swizzleAddr to avoid confusion with
swizzleData.
- For loads, data is unswizzled in completeAcc when writing register
values. This is not as easy to implement as a helper and is thus
implemented for the three load instructions that load more than one
dword.
- Accessing swizzled data requires at least one packet per dword. A new
GPU memory helper is added to create these packets for scratch requests
specifically. This is called in the template<int N> variant of
initMemRead / initMemWrite. Loads and stores of x2 are changed to use
this variant instead of accessing a U64.
The GPUDynInst status vector restrictions are increased to allow for
swizzled x4 accesses. For simplicity this does not currently support
misaligned swizzled accesses and will panic upon seeing such a case.
Change-Id: Ic686c51e28e0af029a043d5a5b3d4069f2cb94f9
The current implementation does not correctly convert subnormal numbers
(number that fill the underflow gap around zero in floating-point
arithmetic). This commit reworks the rounding code to get correct
results.
First, the min_exp is set to 0 which allows for numbers to become
subnormal when rounding. Second, the rounding code now uses something
closer to "GRS" rounding (guard, round, sticky) which represent the
first bit removed when rounding to a smaller type, the next second bit
removed, and whether any of the other bits removed are one. More details
can be found in the code comments.
Change-Id: Idcd2f1e4383e4012fc3abf73b1f73c847d44f67b
The implementation of microscaling formats uses the Open Compute Project
specification which includes a sign bit for NaN and infinity. This
should be preserved when a conversion results in NaN or infinity.
Change-Id: Id9e99324c6486e256c699016aff301d5f06814d5
Currently there is only a scale() method which multiplies a microscaling
type by an int8 value. This should only be applied when upcasting to
a larger type after conversion to match hardware. When downcasting to a
smaller type, the scaling method should divide by the int8 value before
conversion.
This commit adds both scaling methods.
Change-Id: Ibafa8caa389cde4df609e536cd53bd2289959420
At the moment, a hart does not halt if there are pending interrupts.
However, an implementation can also consider the enable status of the
individual interrupts, i.e., a halted hart would only resume if there
are locally enabled pending interrupts. This commit introduces this
behavior. The wfi behavior is controlled by the new configuration
variable wfi_pending_resume of RiscvISA.
Change-Id: I316239f9732c6e73e6ad692491bca08d773dd995
---------
Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
Functional writes atomically update all copies of a data block, so they
should invalidate any pending LL/SC locks, just like a conventional
write would.
Change-Id: Ic79d2d8d24901f1b6a2ce81dc0e2decc84c0ebbc
Added documentation for `strided_generator.py` and
`strided_generator_core.py.`
Updated clarity of documentation for `linear_generator.py`,
`linear_generator_core.py`, `random_generator.py`, and
`random_generator_core.py`.
Made `max_addr` exclusive instead of inclusive for strided and linear
traffic generation in `strided_gen.cc` and `linear_gen.cc`.
These commits primarily fix the SDMA engine which was (1) using pointer
arithmetic on a variable returned by new and then attempting to free the
modified pointer and (2) using a buffer after it was freed due to the
DMA device calling completion event before Ruby actually completed.
Some minor fixes are included: Stop using uninitialized value as packet
context and using same request pointer for two separate packets for GPU
invalidations.
In gem5, we use the same code base for RISC-V 32 and 64.
However, if we need to allow modifiable XLEN control on CSR.mstatus in
the future, we should follow the RISC-V ISA manual to sign-extend all
the register results, including PC and GPR. If this feature implemented,
the simulator needs to handle user-mode in RV32 but CSR.SATP sets to
Sv39. In this case, 0x80000000 and 0xffffffff80000000 are different
addresses in the 64-bit S-Mode perspective, but they are the same in the
32-bit U-Mode perspective. We should avoid this wrong behavior happening
before we implement this feature.
Thus, we need to sign-extend the results of all the addresses, including
the PC and memory addresses, which currently use zero-extend. As
specified in the RISC-V ISA manual, we use zero-extend in narrow XLEN
mode for the physical address implemented in TLB.
Changes based on spec:
1. Sign-extend narrow XLEN:
https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/machine.adoc?plain=1#L567
2. Zero-extend physical address:
https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/supervisor.adoc?plain=1#L1670
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
The SQC and TCC invalidations share a Request pointer which they both
modify. This can cause some problems, so use a different request pointer
for each invalidate. The setContext call is also removed as the value
being assigned to it is uninitialized.
Change-Id: I82ea7aa44a4f4515c1560993caa26cc6a89355af
SDMA packets which use dmaVirtWrites call their completion event before
the write takes place in the Ruby protocol. This causes a use-after-free
issue corruption random memory locations leading to random errors. This
commit adds a cleanup event for each packet that uses DMA and sets the
cleanup latency as 10000 ticks. In atomic mode, the writes complete
exactly 2000 ticks after the completion event is called and therefore a
fixed latency can be used. This is not tested with timing mode, which
does not work with GPUFS at the moment, so a warning is added to give an
idea where to look in case the same issue occurs once timing mode is
supported.
Change-Id: I9ee2689f2becc46bb7794b18b31205f1606109d8
The SDMA engine copies data in chunks. It currently uses the pointer
returned from new[] and manipulates it using pointer arithmetic. This
modified pointer is then passed to the completion function which deletes
the pointer. Since it is not the original pointer allocated by new[]
this triggers issues in ASAN.
Change-Id: I03ccf026633285e75005509445c62fcbda8eb978
This change changes the addresses that are printed when TrafficGen
DebugFlag is enabled. Previously, hex strings were printed without a
preceding 0x. This change fixes that to distinguish between decimal and
hex.
This commit removes some comments and adds constexpr in front
of "bool is_secure..." in pif.cc, signature_path.cc, and
signature_path_v2.cc
Change-Id: Icafe1d7c97d1d3fbf6abc12ba87ebb596255b96f