* Add Cache partitioning policies to manage and enforce cache
partitioning:
* Add Way partition policy
* Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
Partition IDs for cache partitioning and monitoring
* Modify Cache SimObjects to store partition policies
* Modify Cache block eviction logic to use new partitioning policies
Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>
Change-Id: Ib35153a8b46803c22a433926270d82e5e19ce544
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This pull request contains a set of small patches which fix some bugs in
the gem5 prefetchers, and aligns out-of-the box prefetcher performance
more closely with that which a typical user would expect.
The performance patches have been tested with an out-of-the-box
(untuned) Stride prefetcher configuration against a set of SPEC 2017
SimPoints, and show a small-to-modest IPC uplift across the about half
the benchmarks, with no significant IPC degradation.
The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
This PR is an updated version of PR #564, which was reverted due to Bug
#580. Bug #580 was fixed in PR #871. This PR updates #564 to the latest
state of the develop branch, and should be applied after PR #871.
* Add Cache partitioning policies to manage and enforce cache partitioning:
* Add Way partition policy
* Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
Partition IDs for cache partitioning and monitoring
* Modify Cache Tags SimObjects to store partition policies
* Modify Cache Tags block eviction logic to use new partitioning policies
* Add example system and TrafficGen configurations for testing Cache
Partitioning Policies
Change-Id: Ic3fb0f35cf060783fbb9289380721a07e18fad49
Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
For some simulations with big values for VLEN (e.g. 8k and 16k) there
were more packets created on the fly and, as a consequence, failing the
simulations. The sanity check has been increased in order to solve this
high VLEN cases.
Supervised by [@aarmejach](https://github.com/aarmejach)
Change-Id: I137b0f3113687b3fc9c4154d19ca5e8017e6e992
Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
Adds categorization of bypassed atomics in TCC to the TBE as either
return or no-return, which gets consumed in pa_performAtomic to
determine if atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
Adds categorization of bypassed atomics in TCC to the TBE as either return
or no-return, which gets consumed in pa_performAtomic to determine if
atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously
missing.
Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream passes Dirty to the requestor whenever possible
even though it doesn't deallocate the line. This will make the requestor
to SD and the downstream to UD_RSD.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example error condition is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3
doesn't back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock. L3 becomes RUSC and LLC becomes
UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data
This commit optimizes the address generation logic in the strided
prefetcher by introducing the following changes
(d is the degree of the prefetcher)
* Evaluate the fixed prefetch_stride only once (and not d-times)
* Replace 2d multiplications (d * prefetch_stride and distance *
prefetch_stride) with additions by updating the new base prefetch
address while looping
Change-Id: I3ec0c642bc9ec7635b0d38308797e99b645304bb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The Stride Prefetcher will skip this number of strides ahead of the
first identified prefetch, then generate `degree` prefetches at
`stride` intervals. A value of zero indicates no skip (i.e. start
prefetching from the next identified prefetch address).
This parameter can be used to increase the timeliness of prefetches by
starting to prefetch far enough ahead of the demand stream to cover
the memory system latency.
[Richard Cooper <richard.cooper@arm.com>:
- Added detail to commit comment and `distance` Param documentation.
- Changed `distance` Param from `Param.Int` to `Param.Unsigned`.
]
Change-Id: I4ce79c72d74445b12acf68e0a54e13966e30041c
Co-authored-by: Richard Cooper <richard.cooper@arm.com>
Signed-off-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
pkt->req->isCacheMaintenance() would not include a check
for clean eviction before notifying the prefetcher,
causing gem5 to crash.
Change-Id: I4a56c7384818c63d6e2263f26645e87cef1243cb
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Update the default prefetch options to achieve out-of-the box
prefetcher performance closer to that which a typical user would
expect. Configurations that set these parameters explicitly will be
unaffected.
The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
Change-Id: Ia6c1803c86e42feef01de40c34d928de50fe0bed
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Prefetch queue entries were being squashed by comparing the address
of each queued prefetch against the block address of the demand
access. Only prefetches that happen to fall on a cache-line block
boundary would be squashed.
This patch converts the prefetch addresses to block addresses before
comparison.
Change-Id: I3a80a1e3d752f925595e33edebf5359d2cc67182
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
* Fix selectPacket() in LIFO Queue Policy to correctly return the end of
the `deque` backing store for its packet queue
* Move selectPacket() implementations for FIFO and LIFO queues into
`q_policy.cc` file
Change-Id: I8c35e5fc83dc380b19f52be14c18b1f414f9e141
When processing memory Packets for prefetch, the `PrefetchInfo` class
constructor will attempt to copy the `Packet` data. In cases where the
`Packet` under consideration does not contain data, an assertion will be
triggered in the Packet's `getConstPtr` method, causing the simulation
to crash.
This problem was first exposed by Bug #580 when processing an
`UpgradeReq` memory packet.
This patch addresses the problem by suppressing the copying of the
`Packet` data during construction of a `PrefetchInfo` object in cases
where the `Packet` has no data.
This patch addresses Bug #580 [1], which was exposed by PR #564 [2],
subsequently reverted by PR #581 [3]
[1] https://github.com/gem5/gem5/issues/580
[2] https://github.com/gem5/gem5/pull/564
[3] https://github.com/gem5/gem5/pull/581
Change-Id: Ic1e828c0887f4003441b61647440c8e912bf0fbc
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This commit adds support for cache invalidation in GPU VIPER protocol's
SQC cache. To support this, the commit also adds L1 cache invalidation
framework in the Sequencer such that the Sequencer sends out an
invalidation request for each line in the cache and declares completion
once all lines are evicted.
Change-Id: I2f52eacabb2412b16f467f994e985c378230f841
This PR removes a circular dependency between `QoSMemSinkCtrl` and
`QoSMemSinkInterface` that prevented the `controller()` function of
`QoSMemSinkInterface` from being used by removing the default value for
`QoSMemSinkCtrl.interface`.
Change-Id: I4ecc39b974e239be1a2e9285e1f6f8ea873c018d
This commit allows CompData_SD be sent when ReadShared hits on UD line and
the local cache keeps the line, unless the request doesn't allow SD.
Change-Id: I337f24c871cc4c19c5b5fb11f9b35c0a8eb7911c
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream respond with Unique even though it doesn't deallocate
the line. This will make the requestor to UD and the downstream to UD_RU.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example sequence is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3 doesn't
back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock.
L3 becomes RUSC and LLC becomes UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data.
Change-Id: Ic9bee27f2ec8906dd5df8bd3be60e5a9a76c782f
In Ruby CHI protocol UD_RU state means the line is in UD state in
the local cache and the upstream may have it in UD or UC state.
In the previous implementation UD_RU line was just dropped without
WriteBack which can cause loss of dirty data when the upstream has it
in UC state.
This commit fixes it by performing WriteBack when evciting UD_RU line.
Change-Id: I1db9b4f95cc576e71dcef38b01de24775df514ba
Related to issue #703 , this PR removes GCN3 related files and updates
source code, documentation, and tests to switch over to Vega is that was
not done already. Highlights are:
- Remove all src/arch/amdgpu/gcn3 files and update Kconfigs.
- Remove references to GCN3 and replace with Vega where applicable.
- Update the build targets in the gcn-gpu Docker. This will need to be
rebuilt but not urgently.
- Remove the GCN3 tag in testlib. Most tests seem to be using Vega
already, so that commit is small.
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.
Using misc since there is not enough room to fit all the tags.
Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
When restoring the simulate_limit_event pointer is not
restored after running the dry simulation run which ends up in
"Panic: event not found!"
In this commit we fix this issue by correctly restoring
the pointer value along with the event queue head
Change-Id: Id5ad4d2a270a6cd34eec1dc5c9b170b2b84610d4
---------
Co-authored-by: narya <nitish.arya@bsc.es>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
- The bytesRead and bytesWritten stat had duplicate names. Updated
bytesRead and bytesWritten for dram_interface and nvm_interface
Change-Id: I7658e8a0d12ef6b95819bcafa52a85424f01ac76
The WriteUniqueZero is an immediate write to a Snoopable address region
that does not require any data transfer (cacheline is zeroed)
Change-Id: Ia8c9b40e08a3b7d613f0b62ce0ac4b0547860871
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Stash requests will simply be discarded by the Home Node This will
return a CompI response to the RNF
Change-Id: I9c2ce5d4d42f380d1a554933d381cf8a8590ba22
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The stats initialization in the AbstractMemory allocates the space
according to the max requestors of the System. This may cause issues in
multiple system simulation.
Given there are two system A and B. A has one requestor and a memory,
while B has two requestors. When the requestor with requestor id 2
sending requests to the meomry in A, the simulator would crash because
requestor id 2 is out of the allocated space.
Current solution is adding a SysBridge between across A and B which
would rewrite the requestor id to a valid one. This solution works but
it needs to the bridge at the correct boundary which may not easy. In
addition, the stats would record a mapped data which may not accurate.
To reduce the complexity, we add an flag to AbstractMemory to control
the stats. If users don't want the statistics and want to solve the
cross system issue simply, users can disable the statistics collection.
We also makes the flag by default True to not disturb current users.
Change-Id: Ibb46a63d216d4f310b3e920815a295073496ea6e
clang correctly found that the functions `inCache`, `hasBeenPrefetched`
and `inMissQueue` had the wrong signatures in the DVM funcs files. These
functions are unused, so this change just updates their signatures.
Change-Id: Id669ff661e1c6c46eaf04ea1f17cd9866a9e49ed
Update the RubyCache debug flag prints in CacheMemory to be more
descriptive and make clearer what is happening in a given function.
This makes it easier to determine what is happening when looking at the
RubyCache debug flags prints.
Change-Id: Ieee172b6df0d100f4b1e8fe4bba872fc9cf65854
Update the RubyCache debug flag prints in CacheMemory to be more
descriptive and make clearer what is happening in a given function.
This makes it easier to determine what is happening when looking at the
RubyCache debug flags prints.
Change-Id: Ieee172b6df0d100f4b1e8fe4bba872fc9cf65854
This patch adds supports for using the "classic" prefetchers with ruby
cache controllers.
This pull request includes a few commits making the changes in this
order:
- Refactor decouples the classic cache and prefetchers interfaces
- Extras probes for later integration with ruby
- General ruby-side support
- Adds support for the CHI protocol
Commit [mem-ruby: support prefetcher in CHI
protocol](2bdb65653b)
may be used as example on how to add support for other protocols.
JIRA issues that may be related to this pull request:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
This patch adds RubyPrefetcherProxy, which provides means to inject
requests generated by the "classic" prefetchers into a SLICC prefetch
queue. It defines defines notifyPf* functions to be used by protocols
to notify a prefetcher. It also includes the probes required to
interface with the classic implementation.
AbstractController defines the accessor needed to snoop the caches.
A followup patch will add support for RubyPrefetcherProxy in the
CHI protocol.
Related JIRA:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
Additional authors:
Tuan Ta <tuan.ta2@arm.com>
Change-Id: Ie908150b510f951cdd6fd0fd9c95d9760ff70fb0
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
hasBeenPrefetched can now take a requestor id and returns true only if
the block was prefetched by a prefetcher with the same id. This may be
necessary to properly train multiple prefetchers attached to the same
cache. If returns true if the block was prefetched by any prefetcher
when the id is not provided.
Related JIRA:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
Change-Id: I205e000fd5ff100e5a5d24d88bca7c6a46689ab2
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Remove the prefetch_on_access and prefetch_on_pf_hit from BaseCache.
BasePrefetch no longer expects this params to exist in the parent.
Configurations that set these parameter using the cache object were
fixed.
Change-Id: I9ab6a545eaf930ee41ebda74e2b6b8bad0ca35a7
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
This patches decouples the prefetchers from the cache implementation
as the first step to allow using the classic prefetchers with ruby
caches. The prefetchers that need do cache lookups can do so using
the accessor object provided when the probes are notified. This may
also facilitate connecting the same prefetcher to multiple caches.
Related JIRA:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
Change-Id: I4fee1a3613ae009fabf45d7b747e4582cad315ef
Signed-off-by: Tiago Mück <tiago.muck@arm.com>