Implementation of C-Pack, as described in "C-Pack: A High-
Performance Microprocessor Cache Compression Algorithm".
C-Pack uses pattern matching schemes to detect and compress
frequently appearing data patterns. As in the original paper,
it divides the input in 32-bit words, and uses 6 patterns to
match with its dictionary.
For the patterns, each letter represents a byte: Z is a null
byte, M is a dictionary match, X is a new value. The patterns
are ZZZZ, XXXX, MMMM, MMXX, ZZZX, MMMX.
Change-Id: I2efc9db2c862620dcc1155300e39be558f9017e0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/11105
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reset the fault status always before translation is initiated in
pushRequest() in the LSQ. This avoids the problem when a strictly
ordered load needs to be re-executed multiple times. If the
translation is delayed at one of those attempts then the
internal panicFault (from the previous execution attempt) can get
fired at commit.
Change-Id: I0c22b2f7afd6e2cb00bc359a4a01042efd2d01d2
Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19388
Reviewed-by: Ciro Santilli <ciro.santilli@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Renamed member variables to comply with general naming
conventional outside of the ruby folder so that the
filters can be moved out.
Moved code to base to reduce code duplication.
Renamed the private get_index functions to hash, to make their
functionality explicit.
Change-Id: Ic6519cfc5e09ea95bc502a29b27f750f04eda754
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18734
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Most of the index based functions were not implemented, and a
user is more likely to be interested in checking the filter
contents based on an address than an index.
As a side effect, the Bulk's hash function became unused, and
according to the paper permute() was doing more than just
permuting, so it was renamed.
Change-Id: I6423a2565a082fee2e7f11fa489a11f253064d99
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18732
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
In some cases, the point where you create a Coroutine is not the same as
where you want to start running it (and want it to switch back to). This
leads to the unnecessary overhead of switching in and out of the
Coroutine. This change adds an optional boolean argument to the
constructor for the Coroutine class to allow for overriding the default
behavior of running the Coroutine upon creation, which in specific cases
can be used to avoid the unnecessary overhead and improve simulator
performance.
Change-Id: I044698f85e81ee4144208aee30d133bcb462d35d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18588
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Normally, a translation will start via translateTiming/functional
which will check if the miscRegs have been updated and if so,
will update the TLB state accordingly. However, in a 2 stage
system, if there is a hit in stage 1, the resulting IPA will be
sent to the S2-TLB for translation via a getTE() function call
(via the stage2_lookup object). This will cause the state of the
S2-TLB to be out of sync.
Change-Id: I117e4032fc76d7d31f4f999887b5573a7e5811e6
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/14995
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
In large configs the tooltip may be greater then the maximum line
size graphviz supports when parsing the dot file (typically 16k).
Adding '/' causes graphviz to break the string in multiple lines
while parsing and works around this limitation.
Change-Id: I16a0030127de4165080de97f5213309eed9fdeca
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19208
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This is trying to fix the bug that arises when a memory exception
is generated during a fp flavoured load (A memory load targeting
a SIMD & FP register).
With the previous template a fault was not stopping the register
value to be modified (wrong)
if (fault == NoFault) {
fault = readMemAtomic(xc, traceData, EA, Mem, memAccessFlags);
%(memacc_code)s;
}
if (fault == NoFault) {
%(op_wb)s;
}
The patch introduces a Load64FpExecute template which is moving the
register write (memacc_code) just before the op_wb
Change-Id: I1c89c525dfa7a4ef489abe0872cd7baacdd6ce3c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19228
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
AddrRange does not attempt to merge interleaved address ranges if it
has only one of the ranges.
This is needed to allow XBars to accept request targeting only one
part of a interleaved address range. A use case for this would be
modeling distributed LLCs in which a XBar is used solely to
encapsulate the snoop filter of a single LLC slice.
Change-Id: If71c9cf1444ee11916611afb51eab3a4f1d93985
Signed-off-by: Tiago Muck <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18788
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously an AddrRange could express interleaving using a number of
consecutive bits and in additional optionally a second number of
consecutive bits. The two sets of consecutive bits would be xored and
matched against a value to determine if an address is in the
AddrRange. For example:
sel[0] = a[8] ^ a[12]
sel[1] = a[9] ^ a[13]
where sel == intlvMatch
This change extends AddrRange to allow more flexible interleavings
with an abritary number of set of bits which do not need be
consecutive. For example:
sel[0] = a[8] ^ a[11] ^ a[13]
sel[1] = a[15] ^ a[17] ^ a[19]
where sel == intlvMatch
Change-Id: I42220a6d5011a31f0560535762a25bfc823c3ebb
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19130
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
This is an implementation of the SMMUv3 architecture.
What can it do?
- Single-stage and nested translation with 4k or 64k granule. 16k would
be straightforward to add.
- Large pages are supported.
- Works with any gem5 device as long as it is issuing packets with a
valid (Sub)StreamId
What it can't do?
- Fragment stage 1 page when the underlying stage 2 page is smaller. S1
page size > S2 page size is not supported
- Invalidations take zero time. This wouldn't be hard to fix.
- Checkpointing is not supported
- Stall/resume for faulting transactions is not supported
Additional contributors:
- Michiel W. van Tol <Michiel.VanTol@arm.com>
- Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: Ibc606fccd9199b2c1ba739c6335c846ffaa4d564
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19008
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adding an option to enable DRAM low-power states. The low power
states can have a significant impact on application performance
(sim_ticks) on the order of 2-3x, especially for compute-gpu apps.
The options allows for it to easily be enabled/disabled to compare
performance numbers. The option is disabled by default.
Change-Id: Ib9bddbb792a1a6a4afb5339003472ff8f00a5859
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18548
Reviewed-by: Wendy Elsasser <wendy.elsasser@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add NUMBER_BITS_PER_SET environment variable to control
the size of the bitmask in Set.hh (default=64).
Necessary for configs which require >64 instances of a given
machine type. This can be set in the build_opts file, e.g.
by adding the following line:
NUMBER_BITS_PER_SET = <number>
Change-Id: I314a3cadca8ce975fcf4a60d9022494751688e88
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18968
Reviewed-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>