This commit adds support for vector unit-stride segment load operations
for RISC-V (vlseg<NF>e<X>). This implementation is based in two types of
microops:
- VlSeg microops that load data as it is organized in memory in structs
of several fields.
- VectorDeIntrlv microops that properly deinterleave structs into
destination registers.
Gem5 issue: https://github.com/gem5/gem5/issues/382
With the introduction of multi-ISA gem5, we don't store the TARGET_ISA
anymore as a string in the root section of the checkpoint [1]. There is
therefore no way at the moment to asses the ISA of a CPU/ThreadContext.
This is a problem when it comes to checkpoint updates which are ISA
specific.
By explicitly serializing the ISA as a string under the cpu.isa section
we avoid this problem and we let cpt_upgraders be aware of the ISA in
use.
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/48884
A set of cpt_upgraders was patching old checkpoints regardless
of the ISA in use. Thanks to the previous patch, we can now
retrieve the ISA of the CPU from the isa section.
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: Ia110068c06453796cefac028ee13f21667e5371a
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
With the introduction of multi-ISA gem5, we don't store the TARGET_ISA
anymore as a string in the root section of the checkpoint [1]. There is
therefore no way at the moment to asses the ISA of a CPU/ThreadContext.
This is a problem when it comes to checkpoint updates which are ISA
specific.
By explicitly serializing the ISA as a string under the cpu.isa section
we avoid this problem and we let cpt_upgraders be aware of the ISA in
use.
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/48884
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1e75230cbc370cab84f4a54141b1e425af2dbfac
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This commit adds statistics showing completed page walks for 4KB and 2MB
pages. This will add to stats.txt the variables num_4kb_walks,
num_2mb_walks and the corresponding values. This is done based on the
level of page table walk traversed specific to Sv39 Virtual Memory
System.
* Add Cache partitioning policies to manage and enforce cache
partitioning:
* Add Way partition policy
* Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
Partition IDs for cache partitioning and monitoring
* Modify Cache SimObjects to store partition policies
* Modify Cache block eviction logic to use new partitioning policies
Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>
Change-Id: Ib35153a8b46803c22a433926270d82e5e19ce544
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This pull request contains a set of small patches which fix some bugs in
the gem5 prefetchers, and aligns out-of-the box prefetcher performance
more closely with that which a typical user would expect.
The performance patches have been tested with an out-of-the-box
(untuned) Stride prefetcher configuration against a set of SPEC 2017
SimPoints, and show a small-to-modest IPC uplift across the about half
the benchmarks, with no significant IPC degradation.
The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
This PR is an updated version of PR #564, which was reverted due to Bug
#580. Bug #580 was fixed in PR #871. This PR updates #564 to the latest
state of the develop branch, and should be applied after PR #871.
Adding an error message in case the binary is not compatible with gem5.
This PR is addressing the comments in issue #807.
Change-Id: I66466ed6f657276c13d237fde3b1ec12c20cfe91
Previously merged PR #886 created pic.hart_config, but it was not
initialized properly in lupv_board.py. This issue is causing daily tests
to fail.
Change-Id: I193ff4a3e5ef787eefcf066404e762f024fa6603
---------
Co-authored-by: Yu-Cheng Chang <aucixw45876@gmail.com>
One of the limitations of the RegBank class is that it does not allow
you to pass a non-contiguous set of registers. Its simplest form will
just accept an initializer list of registers and it will store them in
sequence.
A more refined version [1] will optionally accept an offset value to be
passed alongside the register reference. This is not meant to be used by
the register bank to store the register at the provided offset.
It is rather used by the bank to sanity check the register sits exactly
at the provided range.
The way to work around this for a fragemented register space is to
explicitly allocate RAZ/RAO blocks as registers and to pass them to
addRegisters together with the others. (See the SysSecCtrl [2] as an
example)
This makes it a bit tedious to model a register bank with gaps between
its registers. First, the exact number and position of the gaps needs to
be extraced from a spec. These sometimes report only implemented
registers and their offset, and omit to document gaps/reserved space. So
a developer needs to manually add register offset and size to check if
all registers are contiguous. Second, these reserved register blocks
need to be instantiated in the bank adding boilerplate code and
affecting readibility.
For these reasons we add a new registration method, called
addRegisters*At*. It reuses the RegisterAdder class but this time the
offset field is really used to instruct the bank where the register
should be mapped. The method is templated and the template parameter
tells the bank which register type should be used to fill the remaining
space. We make the RegBank the owner of this filler space (registers are
generated internally within addRegistersAt).
[1]: https://github.com/gem5/gem5/blob/stable/src/dev/reg_bank.hh#L106
[2]: https://github.com/gem5/gem5/blob/stable/src/dev/arm/ssc.cc#L48
Change-Id: I614ae6e9eeb40b365ac9b6dd8b75abbfdb9cb687
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
* Add Cache partitioning policies to manage and enforce cache partitioning:
* Add Way partition policy
* Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
Partition IDs for cache partitioning and monitoring
* Modify Cache Tags SimObjects to store partition policies
* Modify Cache Tags block eviction logic to use new partitioning policies
* Add example system and TrafficGen configurations for testing Cache
Partitioning Policies
Change-Id: Ic3fb0f35cf060783fbb9289380721a07e18fad49
Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This change adds a fatal statement to check all params for all
SimObjects have been unproxied before C++ object are created.
The fatal statement notifies the user of a mistake that could
possibly lead to a SimObject to not have its params unproxied.
This mistake could be made by adding a child SimObject with a
name that starts with an underscore.
This PR implements a few changes related to the accumulation offset
which is new in MI200. Previously MI100 contained two vector register
files: the architectural and accumulation register files. These have now
been unified and the architectural register file is twice the size. As a
result of this the dispatch packet set an offset into the unified vector
register file for where the former accumulation registers would go. The
changes are:
- Calculate the accumulation offset from dispatch packet and store in
HSA task.
- Update the accumulation move instructions (v_accvgpr_read/write) to
use it.
- Update the current MFMA instructions to use it.
- Make the MFMA examples more clean.
Initialize x86 process' max stack size to the value given in the process
params, rather than hard-coding it to 8 MB, which made it impossible to
run x86 programs requiring more than 8 MB of stack.
Change-Id: I0b17fe60b016b1e4a82d704ef7ad367974ea6a08
For some simulations with big values for VLEN (e.g. 8k and 16k) there
were more packets created on the fly and, as a consequence, failing the
simulations. The sanity check has been increased in order to solve this
high VLEN cases.
Supervised by [@aarmejach](https://github.com/aarmejach)
Change-Id: I137b0f3113687b3fc9c4154d19ca5e8017e6e992
Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
Adds categorization of bypassed atomics in TCC to the TBE as either
return or no-return, which gets consumed in pa_performAtomic to
determine if atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously missing.
Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
Adds categorization of bypassed atomics in TCC to the TBE as either return
or no-return, which gets consumed in pa_performAtomic to determine if
atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
The original version of `list_changes.py` assumed no more than one
`Change-Id` tag per commit. However, since transitioning to GitHub, the
repository now contains some merge commits containing multiple
`Change-Id`s.
This patch updates `list_changes.py` to support commits with any number
of `Change-Id` tags.
Fix QoS Memory Queue Policies
* Fix assertions in LRG policy to correctly assert requestor and list
validity
* Fix `selectPacket()` in LIFO Queue Policy to correctly return the end
of the `deque` backing store for its packet queue
This commit update the two exiting MFMA instructions to support the
accumulation offset for A, B, and C/D matrix. Additionally uses array
indexed C/D matrix registers to reduce duplicate code. Future MFMA
instructions have up to 16 registers for C/D and this reduces the amount
of code being written.
Change-Id: Ibdc3b6255234a3bab99f115c79e8a0248c800400
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously
missing.
Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
The accum offset is used as an index into the unified VGPR register file
in MI200 and is not the same as a move if accum_offset in the dispatch
packet is non-zero.
Change these instructions to use the stored accum_offset value.
Change-Id: Ib661804f8f5b8392e4c586082c423645f539e641
The accumulation offset is needed for some instructions. In order to
access this value we need to place it somewhere instruction definitions
can access. The most logical place is in the wavefront.
This commit simply copies the value from the HSA task to the wavefront
object.
Change-Id: I44ef62ef32d2421953f096c431dd758e882245b4
Fix#874, in which running se.py with 4GB or more memory (via option
--mem-size=4GB) causes all KVM programs to crash or hang. This occurred
because the m5ops address range (set to 0xFFFF0000-0x100000000)
overlapped with physical memory under such a configuration.
This patch fixes the bug by moving the m5ops address range if phyiscal
memory is >=4GB.
Change-Id: Ic8a004517bc2be2c27860ed314460be749a11dc1
Update the PLIC based on the
[riscv-plic-spec](https://github.com/riscv/riscv-plic-spec) in the PR:
- Support customized PLIC hardID and privilege mode configuration
- Backward compatable with the n_contexts parameter, will generate the
config like {0,M}, {0,S}, {1,M} ...
Change-Id: Ibff736827edb7c97921e01fa27f503574a27a562
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream passes Dirty to the requestor whenever possible
even though it doesn't deallocate the line. This will make the requestor
to SD and the downstream to UD_RSD.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example error condition is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3
doesn't back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock. L3 becomes RUSC and LLC becomes
UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data
The empty constructor prevent zero-initialization working correctly. In
this change we fix the issue by removing the unwanted empty constructor.
We also change the default destructor specification with c++11 style.
Change-Id: I869a93ca5283f811c2aa58406f1478459e0d7022
This commit optimizes the address generation logic in the strided
prefetcher by introducing the following changes
(d is the degree of the prefetcher)
* Evaluate the fixed prefetch_stride only once (and not d-times)
* Replace 2d multiplications (d * prefetch_stride and distance *
prefetch_stride) with additions by updating the new base prefetch
address while looping
Change-Id: I3ec0c642bc9ec7635b0d38308797e99b645304bb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The Stride Prefetcher will skip this number of strides ahead of the
first identified prefetch, then generate `degree` prefetches at
`stride` intervals. A value of zero indicates no skip (i.e. start
prefetching from the next identified prefetch address).
This parameter can be used to increase the timeliness of prefetches by
starting to prefetch far enough ahead of the demand stream to cover
the memory system latency.
[Richard Cooper <richard.cooper@arm.com>:
- Added detail to commit comment and `distance` Param documentation.
- Changed `distance` Param from `Param.Int` to `Param.Unsigned`.
]
Change-Id: I4ce79c72d74445b12acf68e0a54e13966e30041c
Co-authored-by: Richard Cooper <richard.cooper@arm.com>
Signed-off-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
pkt->req->isCacheMaintenance() would not include a check
for clean eviction before notifying the prefetcher,
causing gem5 to crash.
Change-Id: I4a56c7384818c63d6e2263f26645e87cef1243cb
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Update the default prefetch options to achieve out-of-the box
prefetcher performance closer to that which a typical user would
expect. Configurations that set these parameters explicitly will be
unaffected.
The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
Change-Id: Ia6c1803c86e42feef01de40c34d928de50fe0bed
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Prefetch queue entries were being squashed by comparing the address
of each queued prefetch against the block address of the demand
access. Only prefetches that happen to fall on a cache-line block
boundary would be squashed.
This patch converts the prefetch addresses to block addresses before
comparison.
Change-Id: I3a80a1e3d752f925595e33edebf5359d2cc67182
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
* Fix selectPacket() in LIFO Queue Policy to correctly return the end of
the `deque` backing store for its packet queue
* Move selectPacket() implementations for FIFO and LIFO queues into
`q_policy.cc` file
Change-Id: I8c35e5fc83dc380b19f52be14c18b1f414f9e141
According to the RISC-V spec [1]. Any float-point instructions
accumulate FFLAGS register rather than write it to reflect the CSR
behavior.
In the previous implementation. We read the FFLAGS, set the exception
flags, and write the result back to the FFLAGS. This works in the gem5
simple and minor CPU model as they are actually written to `regFile`
after executing the instructions. However, in the gem5 O3 CPU model, it
will record in the `destMiscReg` buffer until the commit stage when
writing to the `miscReg` in the execution stage. The next instruction
will get the old FFLAGS and cause the incorrect result.
The CL introduced the `MISCREG_FFLAGS_EXE` and used the same size of
`miscRegFile` because the `MISCREG_FFLAGS_EXE` and `MISCREG_FFLAGS`
shared the same space. When executing the float-pointing instruction,
any exception flags should be updated via `MISCREG_FFLAGS_EXE` to
accumulate the FFLAGS in `setMiscReg` method. For the MISCREG_FFLAGS, it
should only be called in the CSROp.
[1] Syntactic Dependencies: Appendix A
c80ecada1c/src/mm-eplan.adoc (syntactic-dependencies-rules-9-11)
gem5 issue: https://github.com/gem5/gem5/issues/755
Change-Id: Ib7f13d95b8a921c37766a54a217a5a4b1ef17c6f
Fix#876. The x87 floating-point control word (FCW) was not initialized
at process startup in syscall emulation mode. This resulted in floating
point exceptions in KVM mode when executing x87 floating-point
instructions.
This patch fixes the bug by initializing FCW to its reset value, 0x37F.
Change-Id: Idd1573c6951524ef59466cc5c9f1e640ea7658ae
ArmSigInterruptPin don't send the interrupt to GIC. Instead it sends the
interrupt to the irq specified in Param. When using ArmSigInterruptPin,
we shouldn't ask users to provide "Platform" since it doesn't need it.
To reduce the confusion, this change removes the dependency of Platform
for ArmSigInterruptPin.
Change-Id: I0ee507ed1c08b4fa6d3e384e28732f3acb4f6892
This PR is fixing https://github.com/gem5/gem5/issues/668. It fixes it
for all ISAs other than Arm with the first commit, which is setting the
number of architectural Matrix registers to 0 for those ISA which are
not using them.
It then partly fixes it for Arm as well with the 2nd commit: by removing
RenameMap::numFreeEntries we don't stall renaming unless a matrix
instruction is encountered... This means most binaries will run with SMT
as long as they don't use FEAT_SME instructions. Please note: this is
not simply a SMT fix, it will generally address a shortcoming in the way
we were renaming instructions.
If an Arm binary wants to use SMT with FEAT_SME, the 4th commit will
make sure the lack of physical registers is notified explicitly at the
beginning of simulation, rather than silently blocking renaming
When processing memory Packets for prefetch, the `PrefetchInfo` class
constructor will attempt to copy the `Packet` data. In cases where the
`Packet` under consideration does not contain data, an assertion will be
triggered in the Packet's `getConstPtr` method, causing the simulation
to crash.
This problem was first exposed by Bug #580 when processing an
`UpgradeReq` memory packet.
This patch addresses the problem by suppressing the copying of the
`Packet` data during construction of a `PrefetchInfo` object in cases
where the `Packet` has no data.
This patch addresses Bug #580 [1], which was exposed by PR #564 [2],
subsequently reverted by PR #581 [3]
[1] https://github.com/gem5/gem5/issues/580
[2] https://github.com/gem5/gem5/pull/564
[3] https://github.com/gem5/gem5/pull/581
Change-Id: Ic1e828c0887f4003441b61647440c8e912bf0fbc
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>