Gem5 Hardware Transactional Memory (HTM)
Here we provide a brief note describing HTM support in Gem5 at
a high level.
HTM is an architectural feature that enables speculative concurrency in
a shared-memory system; groups of instructions known as transactions are
executed as an atomic unit. The system allows that transactions be
executed concurrently but intervenes if a transaction's
atomicity/isolation is jeapordised and takes corrective action. In this
implementation, corrective active explicitely means rolling back a
thread's architectural state and reverting any memory updates to a point
just before the transaction began.
This HTM implementation relies on--
(1) A checkpointing mechanism for architectural register state.
(2) Buffering speculative memory updates.
This patch is focusing on the definition of the HTM checkpoint (1)
The checkpointing mechanism is architecture dependent. Each ISA
leveraging HTM support can define a class HTMCheckpoint inhereting from
the generic one (GenericISA::HTMCheckpoint).
Those will need to save/restore the architectural state by overriding
the virtual HTMCheckpoint::save (when starting a transaction) and
HTMCheckpoint::restore (when aborting a transaction).
Instances of this class live in O3's ThreadState and Atomic's
SimpleThread. It is up to the ISA to populate this instance when
executing an instruction that begins a new transaction.
JIRA: https://gem5.atlassian.net/browse/GEM5-587
Change-Id: Icd8d1913d23652d78fe89e930ab1e302eb52363d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30314
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Allow all sub-compressors of BDI to be successful as long as
they are able to compress. Then, BDI's actual size threshold
acts as the cutting point.
This situation arises on any multi compressor; yet, generalizing
this assumption might be too bold.
Change-Id: Iec5057d16d4a7ba5fb573133a30ea10869bd67e0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33386
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The size can be zero in special occasions, which would
generate divisions by zero. This patch expands the
stats to support them. It also fixes the compression
factor calculation in the Multi compressor.
As a side effect, now that zero sizes are handled, allow
the Zero compressor to generate it.
Change-Id: I9f7dee76576b09fdc9bef3e1f3f89be3726dcbd9
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33383
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When compressing using a multi-compressor, one must be able to
identify which sub-compressor should be used to decompress data.
This can be achieved by either adding encoding bits to block's
tag or data entry.
It was previously assumed that these encoding bits would be added
to the tag, but now make it a parameter that defaults to the data
entry.
Change-Id: Id322425e7a6ad59cb2ec7a4167a43de4c55c482c
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33380
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The compressors are not able to process a whole line at once,
so they must divide it into multiple same-sized chunks. This
patch makes the base compressor responsible for this division,
so that the derived classes are mostly agnostic to this
translation.
This change has been coupled with a change of the signature
of the public compress() to avoid introducing a temporary
function rename. Previously, this function did not return
the compressed data, under the assumption that everything
related to the compressed data would be handled by the
compressor. However, sometimes the units using the compressor
could need to know or store the compressed data.
For example, when sharing dictionaries the compressed data
must be checked to determine if two blocks can co-allocate
(DISH, Panda et al. 2016).
Change-Id: Id8dbf68936b1457ca8292cc0a852b0f0a2eeeb51
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33379
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously either the compression data was the one declared within
DictionaryCompressor, or the derived class would have to override
the compress() to use a derived compression data.
With this change, the instantiation can be overridden, and thus
any derived class can choose the compression data pointer type
they need to use.
Change-Id: I387936265a3de6785a6096c7a6bd21774202b1c7
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33378
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When applying the bitwise not to a short integer the compiler
automatically promotes it to an integer. For example, if a 8-bit
mask=0xFF, and the compiler decides to promote the mask to 32-bit
to apply the bitwise not, ~mask=0xFFFFFF00, which will yield wrong
results for popcount(): expected=0, got=24.
Change-Id: I95efba5532c27ca004ff6947d4b51a8a14f09741
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33374
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
isa_traits.hh used to have much more in it, but now it only has
PageShift, PageBytes, and (for now) the guest endianness. These values
should only be retrieved from the System class generally speaking, so
only the system class should include arch/isa_traits.hh.
Some gpu compute related files need PageBytes or PageShift. Even though
those files don't advertise their ISA dependence, they are tied to x86.
In those files, they can include arch/x86/isa_traits.hh.
The only other file which legitimately needs arch/isa_traits.hh is the
decoder cache since it uses PageBytes to size an array.
Change-Id: I12686368715623e3140a68a7027c136bd52567b1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33203
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
After this change, if you use these classes or inherit from these
classes, the compiler will now give you a warning that these names are
deprecated. Instead, you should use ResponsePort and RequestPort,
respectively.
This patch simply deprecates these names. The following patches will
convert all of the code in gem5 to use these new names. The first step
is converting the class names and the uses of these classes, then we
will update the variable names to be more precise as well.
Change-Id: I5e6e90b2916df4dbfccdaabe97423f377a1f6e3f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32308
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Creation of the Compressor namespace. It encapsulates all the cache
compressors, and other classes used by them.
The following classes have been renamed:
BaseCacheCompressor -> Base
PerfectCompressor - Perfect
RepeatedQwordsCompressor -> RepeatedQwords
ZeroCompressor -> Zero
BaseDictionaryCompressor and DictionaryCompressor were not renamed
because the there is a high probability that users may want to
create a Dictionary class that encompasses the dictionary contained
by these compressors.
To apply this patch one must force recompilation (e.g., by deleting
it) of build/<arch>/params/BaseCache.hh (and any other files that
were previously using these compressors).
Change-Id: I78cb3b6fb8e3e50a52a04268e0e08dd664d81230
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33294
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The `BasePrefetcher` python class had members `_events` and `_tlbs`
defined as lists, meaning that any call to `list.append` on them would
affect `_events` and `_tlbs` for all prefetchers, not just the calling
object. This change redefines them as instance members to fix the
problem.
Change-Id: I68feb1d6d78e2fa5e8775afba8c81c6dd0de6c60
Signed-off-by: Isaac Sánchez Barrera <isaac.sanchez@bsc.es>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32394
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
There are race conditions while running several benchmarks, where
the DMA engine and the CorePair simultaneously send requests for the
same block. This patch fixes two scenarios
(a) If the request from the DMA engine arrives before the one from the
CorePair, the directory controller records it as a pending request.
However, once the DMA request is serviced, the directory doesn't check
for pending requests. The CorePair, consequently, never sees a response
to its request and this results in a Deadlock.
Added call to wakeUpDependents in the transition from BDR_Pm to U
Added call to wakeUpDependents in the transition from BDW_P to U
(b) If the request from the CorePair is being serviced by the directory
and the DMA requests for the same block, this causes an invalid
transition because the current coherence doesn't take care of this
scenario.
Added transition state where the requests from DMA are added to the
stall buffer.
Updated B to U CoreUnblock transition to check all buffers, as the DMA
requests were being placed later in the stall buffer than was being checked
Change-Id: I5a76efef97723bc53cf239ea7e112f84fc874ef8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31996
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In checkpoint output files, the parameters for page table including
size and entries are organized not very clearly. For example:
[system.cpu.workload]
...
ptable.size=...
[system.cpu.workload.Entry0]
vaddr=...
paddr=...
flags=...
[system.cpu.workload.Entry1]
...
This commit moves these parameters into a separate section named
'ptable'. For example:
[system.cpu.workload.ptable]
size=...
[system.cpu.workload.ptable.Entry0]
vaddr=...
paddr=...
flags=...
[system.cpu.workload.ptable.Entry1]
...
Change-Id: Iaa4129b3f4f090e8c3651bde90524abba0999c7f
Signed-off-by: Ian Jiang <ianjiang.ict@gmail.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31874
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This changeset adds the necessary changes for running
GCN3 ISA with VIPER in apu_se.py.
Changes to the VIPER protocol configs are made to add support
for DMA and scalar caches.
hsaTopology is added to help the pseudo FS create the files
needed by ROCm to understand the device on which the SW is
being run.
Change-Id: I0f47a6a36bb241a26972c0faafafcf332a7d7d1f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30274
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adding getter and setter methods for getting and setting the atomic ops
in the WriteMask class. This allows for message types with WriteMasks to
get or set the atomic ops without explicitly modifying the constructor
for the message type. This will beused by the DMASequencer which uses the
SequencerMsg type where the constructor is auto generated via SLICC.
Change-Id: I71787d294c1b89547618e9a13e386b65bb3e1021
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31474
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The data rate is used by the drampower lib to estimate the power
consumption of the DRAM Core. Previously, we used the formula:
burst_cycles = divCeil(p->tBURST_MAX, p->tCK);
data_rate = p->burst_length / burst_cycles;
to derive the data_rate. However, under certain configurations this
formula computes the wrong result due to rounding errors. This patch
simplifies the way we derive the data_rate by passing the value of the
DRAM parameter beats_per_clock.
Change-Id: Ic8cd35bb4641d9c0a704675d2672a6fe4f4ec13e
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Wendy Elsasser <wendy.elsasser@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30056
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Compiling gem5 with dramsim2 included fails due to some inconsistencies in
including SimObjects. In this patch this issue is fixed along with
temporarily disabling -Werror=nonnull-compare in CCFLAGS. Also, the remote
for cloning dramsim2 has been changed.
Change-Id: Ia24095150d026d736352aaf0d735b7554ede10bb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31434
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The RubyPrefetcher uses makeNextStrideAddress() with
a negative stride to find prefetched address.
The type of this expression is:
uint64_t + uint32_t * int;
This gives wrong result due to implicit conversion.
Fix this with static cast and it works correctly:
uint64_t + int * int;
Change-Id: I36e17e00d5c66c3699fe1d5b287971225a162d04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31314
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch adds the ability for a host-OS process external to gem5
to access the backing store via POSIX shared memory.
The new param shared_backstore of the System object is the filename
of the shared memory (i.e., the first argument to shm_open()).
Change-Id: I98c948a32a15049a4515e6c02a14595fb5fe379f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30994
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adds support for device memories in the system and RubySystem classes.
Devices may register memory ranges with the system class and packets
which originate from the device MasterID will update the device memory
in Ruby. In RubySystem functional access is updated to keep the packets
within the Ruby network they originated from.
Change-Id: I47850df1dc1994485d471ccd9da89e8d88eb0d20
JIRA: https://gem5.atlassian.net/browse/GEM5-470
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29653
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously, with HSAIL, we were guaranteed by the HSA specification
that the GPU will never issue unaligned accesses. However, now
that we are directly running GCN this is no longer true.
Accordingly, this commit adds support for unaligned accesses.
Moreover, to reduce the replication of nearly identical
code for the different request types, I also added new helper
functions that are called by all the different memory request
producing instruction types in op_encodings.hh.
Adding support for unaligned instructions requires changing
the statusBitVector used to track the status of the memory
requests for each lane from a bit per lane to an int per lane.
This is necessary because an unaligned access may span multiple
cache lines. In the worst case, each lane may span multiple
cache lines. There are corresponding changes in the files that
use the statusBitVector.
Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>