mem-ruby: HTM mem implementation

This patch augments the MESI_Three_Level Ruby protocol with hardware
transactional memory support.

The HTM implementation relies on buffering of speculative memory updates.
The core notifies the L0 cache controller that a new transaction has
started and the controller in turn places itself in transactional state
(htmTransactionalState := true).

When operating in transactional state, the usual MESI protocol changes
slightly. Lines loaded or stored are marked as part of a transaction's
read and write set respectively. If there is an invalidation request to
cache line in the read/write set, the transaction is marked as failed.
Similarly, if there is a read request by another core to a speculatively
written cache line, i.e. in the write set, the transaction is marked as
failed. If failed, all subsequent loads and stores from the core are
made benign, i.e. made into NOPS at the cache controller, and responses
are marked to indicate that the transactional state has failed. When the
core receives these marked responses, it generates a HtmFailureFault
with the reason for the transaction failure. Servicing this fault does
two things--

(a) Restores the architectural checkpoint
(b) Sends an HTM abort signal to the cache controller

The restoration includes all registers in the checkpoint as well as the
program counter of the instruction before the transaction started.

The abort signal is sent to the L0 cache controller and resets the
failed transactional state. It resets the transactional read and write
sets and invalidates any speculatively written cache lines.  It also
exits the transactional state so that the MESI protocol operates as
usual.

Alternatively, if the instructions within a transaction complete without
triggering a HtmFailureFault, the transaction can be committed. The core
is responsible for notifying the cache controller that the transaction
is complete and the cache controller makes all speculative writes
visible to the rest of the system and exits the transactional state.

Notifting the cache controller is done through HtmCmd Requests which are
a subtype of Load Requests.

KUDOS:
The code is based on a previous pull request by Pradip Vallathol who
developed HTM and TSX support in Gem5 as part of his master’s thesis:

http://reviews.gem5.org/r/2308/index.html

JIRA: https://gem5.atlassian.net/browse/GEM5-587

Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30319
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit is contained in:
Timothy Hayes
2020-01-14 16:29:09 +00:00
committed by Giacomo Travaglini
parent 1c61dae99b
commit 0a8a787de3
24 changed files with 2885 additions and 34 deletions

View File

@@ -0,0 +1,6 @@
# Copyright (c) 2019 ARM Limited
# All rights reserved.
TARGET_ISA = 'arm'
CPU_MODELS = 'TimingSimpleCPU,O3CPU'
PROTOCOL = 'MESI_Three_Level_HTM'

View File

@@ -0,0 +1,337 @@
# Copyright (c) 2006-2007 The Regents of The University of Michigan
# Copyright (c) 2009,2015 Advanced Micro Devices, Inc.
# Copyright (c) 2013 Mark D. Hill and David A. Wood
# Copyright (c) 2020 ARM Limited
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import math
import m5
from m5.objects import *
from m5.defines import buildEnv
from .Ruby import create_topology, create_directories
from .Ruby import send_evicts
from common import FileSystemConfig
#
# Declare caches used by the protocol
#
class L0Cache(RubyCache): pass
class L1Cache(RubyCache): pass
class L2Cache(RubyCache): pass
def define_options(parser):
parser.add_option("--num-clusters", type = "int", default = 1,
help = "number of clusters in a design in which there are shared\
caches private to clusters")
parser.add_option("--l0i_size", type="string", default="4096B")
parser.add_option("--l0d_size", type="string", default="4096B")
parser.add_option("--l0i_assoc", type="int", default=1)
parser.add_option("--l0d_assoc", type="int", default=1)
parser.add_option("--l0_transitions_per_cycle", type="int", default=32)
parser.add_option("--l1_transitions_per_cycle", type="int", default=32)
parser.add_option("--l2_transitions_per_cycle", type="int", default=4)
parser.add_option("--enable-prefetch", action="store_true", default=False,\
help="Enable Ruby hardware prefetcher")
return
def create_system(options, full_system, system, dma_ports, bootmem,
ruby_system):
if buildEnv['PROTOCOL'] != 'MESI_Three_Level_HTM':
fatal("This script requires the MESI_Three_Level protocol to be\
built.")
cpu_sequencers = []
#
# The ruby network creation expects the list of nodes in the system to be
# consistent with the NetDest list. Therefore the l1 controller nodes
# must be listed before the directory nodes and directory nodes before
# dma nodes, etc.
#
l0_cntrl_nodes = []
l1_cntrl_nodes = []
l2_cntrl_nodes = []
dma_cntrl_nodes = []
assert (options.num_cpus % options.num_clusters == 0)
num_cpus_per_cluster = options.num_cpus / options.num_clusters
assert (options.num_l2caches % options.num_clusters == 0)
num_l2caches_per_cluster = options.num_l2caches / options.num_clusters
l2_bits = int(math.log(num_l2caches_per_cluster, 2))
block_size_bits = int(math.log(options.cacheline_size, 2))
l2_index_start = block_size_bits + l2_bits
#
# Must create the individual controllers before the network to ensure the
# controller constructors are called before the network constructor
#
for i in range(options.num_clusters):
for j in range(num_cpus_per_cluster):
#
# First create the Ruby objects associated with this cpu
#
l0i_cache = L0Cache(size = options.l0i_size,
assoc = options.l0i_assoc,
is_icache = True,
start_index_bit = block_size_bits,
replacement_policy = LRURP())
l0d_cache = L0Cache(size = options.l0d_size,
assoc = options.l0d_assoc,
is_icache = False,
start_index_bit = block_size_bits,
replacement_policy = LRURP())
# the ruby random tester reuses num_cpus to specify the
# number of cpu ports connected to the tester object, which
# is stored in system.cpu. because there is only ever one
# tester object, num_cpus is not necessarily equal to the
# size of system.cpu; therefore if len(system.cpu) == 1
# we use system.cpu[0] to set the clk_domain, thereby ensuring
# we don't index off the end of the cpu list.
if len(system.cpu) == 1:
clk_domain = system.cpu[0].clk_domain
else:
clk_domain = system.cpu[i].clk_domain
# Ruby prefetcher
prefetcher = RubyPrefetcher(
num_streams=16,
unit_filter = 256,
nonunit_filter = 256,
train_misses = 5,
num_startup_pfs = 4,
cross_page = True
)
l0_cntrl = L0Cache_Controller(
version = i * num_cpus_per_cluster + j,
Icache = l0i_cache, Dcache = l0d_cache,
transitions_per_cycle = options.l0_transitions_per_cycle,
prefetcher = prefetcher,
enable_prefetch = options.enable_prefetch,
send_evictions = send_evicts(options),
clk_domain = clk_domain,
ruby_system = ruby_system)
cpu_seq = RubyHTMSequencer(version = i * num_cpus_per_cluster + j,
icache = l0i_cache,
clk_domain = clk_domain,
dcache = l0d_cache,
ruby_system = ruby_system)
l0_cntrl.sequencer = cpu_seq
l1_cache = L1Cache(size = options.l1d_size,
assoc = options.l1d_assoc,
start_index_bit = block_size_bits,
is_icache = False)
l1_cntrl = L1Cache_Controller(
version = i * num_cpus_per_cluster + j,
cache = l1_cache, l2_select_num_bits = l2_bits,
cluster_id = i,
transitions_per_cycle = options.l1_transitions_per_cycle,
ruby_system = ruby_system)
exec("ruby_system.l0_cntrl%d = l0_cntrl"
% ( i * num_cpus_per_cluster + j))
exec("ruby_system.l1_cntrl%d = l1_cntrl"
% ( i * num_cpus_per_cluster + j))
#
# Add controllers and sequencers to the appropriate lists
#
cpu_sequencers.append(cpu_seq)
l0_cntrl_nodes.append(l0_cntrl)
l1_cntrl_nodes.append(l1_cntrl)
# Connect the L0 and L1 controllers
l0_cntrl.prefetchQueue = MessageBuffer()
l0_cntrl.mandatoryQueue = MessageBuffer()
l0_cntrl.bufferToL1 = MessageBuffer(ordered = True)
l1_cntrl.bufferFromL0 = l0_cntrl.bufferToL1
l0_cntrl.bufferFromL1 = MessageBuffer(ordered = True)
l1_cntrl.bufferToL0 = l0_cntrl.bufferFromL1
# Connect the L1 controllers and the network
l1_cntrl.requestToL2 = MessageBuffer()
l1_cntrl.requestToL2.master = ruby_system.network.slave
l1_cntrl.responseToL2 = MessageBuffer()
l1_cntrl.responseToL2.master = ruby_system.network.slave
l1_cntrl.unblockToL2 = MessageBuffer()
l1_cntrl.unblockToL2.master = ruby_system.network.slave
l1_cntrl.requestFromL2 = MessageBuffer()
l1_cntrl.requestFromL2.slave = ruby_system.network.master
l1_cntrl.responseFromL2 = MessageBuffer()
l1_cntrl.responseFromL2.slave = ruby_system.network.master
for j in range(num_l2caches_per_cluster):
l2_cache = L2Cache(size = options.l2_size,
assoc = options.l2_assoc,
start_index_bit = l2_index_start)
l2_cntrl = L2Cache_Controller(
version = i * num_l2caches_per_cluster + j,
L2cache = l2_cache, cluster_id = i,
transitions_per_cycle =\
options.l2_transitions_per_cycle,
ruby_system = ruby_system)
exec("ruby_system.l2_cntrl%d = l2_cntrl"
% (i * num_l2caches_per_cluster + j))
l2_cntrl_nodes.append(l2_cntrl)
# Connect the L2 controllers and the network
l2_cntrl.DirRequestFromL2Cache = MessageBuffer()
l2_cntrl.DirRequestFromL2Cache.master = ruby_system.network.slave
l2_cntrl.L1RequestFromL2Cache = MessageBuffer()
l2_cntrl.L1RequestFromL2Cache.master = ruby_system.network.slave
l2_cntrl.responseFromL2Cache = MessageBuffer()
l2_cntrl.responseFromL2Cache.master = ruby_system.network.slave
l2_cntrl.unblockToL2Cache = MessageBuffer()
l2_cntrl.unblockToL2Cache.slave = ruby_system.network.master
l2_cntrl.L1RequestToL2Cache = MessageBuffer()
l2_cntrl.L1RequestToL2Cache.slave = ruby_system.network.master
l2_cntrl.responseToL2Cache = MessageBuffer()
l2_cntrl.responseToL2Cache.slave = ruby_system.network.master
# Run each of the ruby memory controllers at a ratio of the frequency of
# the ruby system
# clk_divider value is a fix to pass regression.
ruby_system.memctrl_clk_domain = DerivedClockDomain(
clk_domain = ruby_system.clk_domain, clk_divider = 3)
mem_dir_cntrl_nodes, rom_dir_cntrl_node = create_directories(
options, bootmem, ruby_system, system)
dir_cntrl_nodes = mem_dir_cntrl_nodes[:]
if rom_dir_cntrl_node is not None:
dir_cntrl_nodes.append(rom_dir_cntrl_node)
for dir_cntrl in dir_cntrl_nodes:
# Connect the directory controllers and the network
dir_cntrl.requestToDir = MessageBuffer()
dir_cntrl.requestToDir.slave = ruby_system.network.master
dir_cntrl.responseToDir = MessageBuffer()
dir_cntrl.responseToDir.slave = ruby_system.network.master
dir_cntrl.responseFromDir = MessageBuffer()
dir_cntrl.responseFromDir.master = ruby_system.network.slave
dir_cntrl.requestToMemory = MessageBuffer()
dir_cntrl.responseFromMemory = MessageBuffer()
for i, dma_port in enumerate(dma_ports):
#
# Create the Ruby objects associated with the dma controller
#
dma_seq = DMASequencer(version = i, ruby_system = ruby_system)
dma_cntrl = DMA_Controller(version = i,
dma_sequencer = dma_seq,
transitions_per_cycle = options.ports,
ruby_system = ruby_system)
exec("ruby_system.dma_cntrl%d = dma_cntrl" % i)
exec("ruby_system.dma_cntrl%d.dma_sequencer.slave = dma_port" % i)
dma_cntrl_nodes.append(dma_cntrl)
# Connect the dma controller to the network
dma_cntrl.mandatoryQueue = MessageBuffer()
dma_cntrl.responseFromDir = MessageBuffer(ordered = True)
dma_cntrl.responseFromDir.slave = ruby_system.network.master
dma_cntrl.requestToDir = MessageBuffer()
dma_cntrl.requestToDir.master = ruby_system.network.slave
all_cntrls = l0_cntrl_nodes + \
l1_cntrl_nodes + \
l2_cntrl_nodes + \
dir_cntrl_nodes + \
dma_cntrl_nodes
# Create the io controller and the sequencer
if full_system:
io_seq = DMASequencer(version=len(dma_ports), ruby_system=ruby_system)
ruby_system._io_port = io_seq
io_controller = DMA_Controller(version = len(dma_ports),
dma_sequencer = io_seq,
ruby_system = ruby_system)
ruby_system.io_controller = io_controller
# Connect the dma controller to the network
io_controller.mandatoryQueue = MessageBuffer()
io_controller.responseFromDir = MessageBuffer(ordered = True)
io_controller.responseFromDir.slave = ruby_system.network.master
io_controller.requestToDir = MessageBuffer()
io_controller.requestToDir.master = ruby_system.network.slave
all_cntrls = all_cntrls + [io_controller]
# Register configuration with filesystem
else:
for i in range(options.num_clusters):
for j in range(num_cpus_per_cluster):
FileSystemConfig.register_cpu(physical_package_id = 0,
core_siblings = range(options.num_cpus),
core_id = i*num_cpus_per_cluster+j,
thread_siblings = [])
FileSystemConfig.register_cache(level = 0,
idu_type = 'Instruction',
size = options.l0i_size,
line_size =\
options.cacheline_size,
assoc = 1,
cpus = [i*num_cpus_per_cluster+j])
FileSystemConfig.register_cache(level = 0,
idu_type = 'Data',
size = options.l0d_size,
line_size =\
options.cacheline_size,
assoc = 1,
cpus = [i*num_cpus_per_cluster+j])
FileSystemConfig.register_cache(level = 1,
idu_type = 'Unified',
size = options.l1d_size,
line_size = options.cacheline_size,
assoc = options.l1d_assoc,
cpus = [i*num_cpus_per_cluster+j])
FileSystemConfig.register_cache(level = 2,
idu_type = 'Unified',
size = str(MemorySize(options.l2_size) * \
num_l2caches_per_cluster)+'B',
line_size = options.cacheline_size,
assoc = options.l2_assoc,
cpus = [n for n in range(i*num_cpus_per_cluster, \
(i+1)*num_cpus_per_cluster)])
ruby_system.network.number_of_virtual_networks = 3
topology = create_topology(all_cntrls, options)
return (cpu_sequencers, mem_dir_cntrl_nodes, topology)

View File

@@ -138,4 +138,5 @@ MakeInclude('system/Sequencer.hh')
# <# include "mem/ruby/protocol/header.hh"> in any file
# generated_dir = Dir('protocol')
MakeInclude('system/GPUCoalescer.hh')
MakeInclude('system/HTMSequencer.hh')
MakeInclude('system/VIPERCoalescer.hh')

View File

@@ -130,6 +130,14 @@ machine(MachineType:L1Cache, "MESI Directory L1 Cache CMP")
Ack_all, desc="Last ack for processor";
WB_Ack, desc="Ack for replacement";
// hardware transactional memory
L0_DataCopy, desc="Data Block from L0. Should remain in M state.";
// L0 cache received the invalidation message and has
// sent a NAK (because of htm abort) saying that the data
// in L1 is the latest value.
L0_DataNak, desc="L0 received INV message, specifies its data is also stale";
}
// TYPES
@@ -361,6 +369,10 @@ machine(MachineType:L1Cache, "MESI Directory L1 Cache CMP")
if(in_msg.Class == CoherenceClass:INV_DATA) {
trigger(Event:L0_DataAck, in_msg.addr, cache_entry, tbe);
} else if (in_msg.Class == CoherenceClass:NAK) {
trigger(Event:L0_DataNak, in_msg.addr, cache_entry, tbe);
} else if (in_msg.Class == CoherenceClass:PUTX_COPY) {
trigger(Event:L0_DataCopy, in_msg.addr, cache_entry, tbe);
} else if (in_msg.Class == CoherenceClass:INV_ACK) {
trigger(Event:L0_Ack, in_msg.addr, cache_entry, tbe);
} else {
@@ -808,18 +820,6 @@ machine(MachineType:L1Cache, "MESI Directory L1 Cache CMP")
k_popL0RequestQueue;
}
transition(EE, Load, E) {
hh_xdata_to_l0;
uu_profileHit;
k_popL0RequestQueue;
}
transition(MM, Load, M) {
hh_xdata_to_l0;
uu_profileHit;
k_popL0RequestQueue;
}
transition({S,SS}, Store, SM) {
i_allocateTBE;
c_issueUPGRADE;
@@ -1034,7 +1034,7 @@ machine(MachineType:L1Cache, "MESI Directory L1 Cache CMP")
kd_wakeUpDependents;
}
transition(SM, L0_Invalidate_Else, SM_IL0) {
transition(SM, {Inv,L0_Invalidate_Else}, SM_IL0) {
forward_eviction_to_L0_else;
}
@@ -1093,4 +1093,55 @@ machine(MachineType:L1Cache, "MESI Directory L1 Cache CMP")
transition({S_IL0, M_IL0, E_IL0, MM_IL0}, {Inv, Fwd_GETX, Fwd_GETS}) {
z2_stallAndWaitL2Queue;
}
// hardware transactional memory
// If a transaction has aborted, the L0 could re-request
// data which is in E or EE state in L1.
transition({EE,E}, Load, E) {
hh_xdata_to_l0;
uu_profileHit;
k_popL0RequestQueue;
}
// If a transaction has aborted, the L0 could re-request
// data which is in M or MM state in L1.
transition({MM,M}, Load, M) {
hh_xdata_to_l0;
uu_profileHit;
k_popL0RequestQueue;
}
// If a transaction has aborted, the L0 could re-request
// data which is in M state in L1.
transition({E,M}, Store, M) {
hh_xdata_to_l0;
uu_profileHit;
k_popL0RequestQueue;
}
// A transaction may have tried to modify a cache block in M state with
// non-speculative (pre-transactional) data. This needs to be copied
// to the L1 before any further modifications occur at the L0.
transition({M,E}, L0_DataCopy, M) {
u_writeDataFromL0Request;
k_popL0RequestQueue;
}
transition({M_IL0, E_IL0}, L0_DataCopy, M_IL0) {
u_writeDataFromL0Request;
k_popL0RequestQueue;
}
// A NAK from the L0 means that the L0 invalidated its
// modified line (due to an abort) so it is therefore necessary
// to use the L1's correct version instead
transition({M_IL0, E_IL0}, L0_DataNak, MM) {
k_popL0RequestQueue;
kd_wakeUpDependents;
}
transition(I, L1_Replacement) {
ff_deallocateCacheBlock;
}
}

View File

@@ -48,6 +48,7 @@ enumeration(CoherenceClass, desc="...") {
INV_OWN, desc="Invalidate (own)";
INV_ELSE, desc="Invalidate (else)";
PUTX, desc="Replacement message";
PUTX_COPY, desc="Data block to be copied in L1. L0 will still be in M state";
WB_ACK, desc="Writeback ack";
@@ -59,6 +60,7 @@ enumeration(CoherenceClass, desc="...") {
DATA, desc="Data block for L1 cache in S state";
DATA_EXCLUSIVE, desc="Data block for L1 cache in M/E state";
ACK, desc="Generic invalidate ack";
NAK, desc="Used by L0 to tell L1 that it cannot provide the latest value";
// This is a special case in which the L1 cache lost permissions to the
// shared block before it got the data. So the L0 cache can use the data

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,9 @@
protocol "MESI_Three_Level_HTM";
include "RubySlicc_interfaces.slicc";
include "MESI_Two_Level-msg.sm";
include "MESI_Three_Level-msg.sm";
include "MESI_Three_Level_HTM-L0cache.sm";
include "MESI_Three_Level-L1cache.sm";
include "MESI_Two_Level-L2cache.sm";
include "MESI_Two_Level-dir.sm";
include "MESI_Two_Level-dma.sm";

View File

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019 ARM Limited
* Copyright (c) 2020 ARM Limited
* All rights reserved.
*
* The license below extends only to copyright in the software and shall
@@ -167,6 +167,31 @@ enumeration(RubyRequestType, desc="...", default="RubyRequestType_NULL") {
Release, desc="Release operation";
Acquire, desc="Acquire opertion";
AcquireRelease, desc="Acquire and Release opertion";
HTM_Start, desc="hardware memory transaction: begin";
HTM_Commit, desc="hardware memory transaction: commit";
HTM_Cancel, desc="hardware memory transaction: cancel";
HTM_Abort, desc="hardware memory transaction: abort";
}
bool isWriteRequest(RubyRequestType type);
bool isDataReadRequest(RubyRequestType type);
bool isReadRequest(RubyRequestType type);
bool isHtmCmdRequest(RubyRequestType type);
// hardware transactional memory
RubyRequestType htmCmdToRubyRequestType(Packet *pkt);
enumeration(HtmCallbackMode, desc="...", default="HtmCallbackMode_NULL") {
HTM_CMD, desc="htm command";
LD_FAIL, desc="htm transaction failed - inform via read";
ST_FAIL, desc="htm transaction failed - inform via write";
}
enumeration(HtmFailedInCacheReason, desc="...", default="HtmFailedInCacheReason_NO_FAIL") {
NO_FAIL, desc="no failure in cache";
FAIL_SELF, desc="failed due local cache's replacement policy";
FAIL_REMOTE, desc="failed due remote invalidation";
FAIL_OTHER, desc="failed due other circumstances";
}
enumeration(SequencerRequestType, desc="...", default="SequencerRequestType_NULL") {

View File

@@ -132,12 +132,18 @@ structure (Sequencer, external = "yes") {
// ll/sc support
void writeCallbackScFail(Addr, DataBlock);
bool llscCheckMonitor(Addr);
void llscClearLocalMonitor();
void evictionCallback(Addr);
void recordRequestType(SequencerRequestType);
bool checkResourceAvailable(CacheResourceType, Addr);
}
structure (HTMSequencer, interface="Sequencer", external = "yes") {
// hardware transactional memory
void htmCallback(Addr, HtmCallbackMode, HtmFailedInCacheReason);
}
structure(RubyRequest, desc="...", interface="Message", external="yes") {
Addr LineAddress, desc="Line address for this request";
Addr PhysicalAddress, desc="Physical address for this request";
@@ -152,6 +158,8 @@ structure(RubyRequest, desc="...", interface="Message", external="yes") {
int wfid, desc="Writethrough wavefront";
uint64_t instSeqNum, desc="Instruction sequence number";
PacketPtr pkt, desc="Packet associated with this request";
bool htmFromTransaction, desc="Memory request originates within a HTM transaction";
int htmTransactionUid, desc="Used to identify the unique HTM transaction that produced this request";
}
structure(AbstractCacheEntry, primitive="yes", external = "yes") {
@@ -185,6 +193,10 @@ structure (CacheMemory, external = "yes") {
void recordRequestType(CacheRequestType, Addr);
bool checkResourceAvailable(CacheResourceType, Addr);
// hardware transactional memory
void htmCommitTransaction();
void htmAbortTransaction();
int getCacheSize();
int getNumBlocks();
Addr getAddressAtIdx(int);

View File

@@ -38,6 +38,7 @@ all_protocols.extend([
'MOESI_AMD_Base',
'MESI_Two_Level',
'MESI_Three_Level',
'MESI_Three_Level_HTM',
'MI_example',
'MOESI_CMP_directory',
'MOESI_CMP_token',

View File

@@ -1,4 +1,16 @@
/*
* Copyright (c) 2020 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and shall
* not be construed as granting a license to any other intellectual
* property including but not limited to intellectual property relating
* to a hardware implementation of the functionality of the software
* licensed hereunder. You may use the software subject to the license
* terms below provided that you ensure that this notice is replicated
* unmodified and in its entirety in all distributions of the software,
* modified or unmodified, in source code or in binary form.
*
* Copyright (c) 1999-2008 Mark D. Hill and David A. Wood
* All rights reserved.
*
@@ -37,6 +49,8 @@ AbstractCacheEntry::AbstractCacheEntry() : ReplaceableEntry()
m_Address = 0;
m_locked = -1;
m_last_touch_tick = 0;
m_htmInReadSet = false;
m_htmInWriteSet = false;
}
AbstractCacheEntry::~AbstractCacheEntry()
@@ -81,3 +95,27 @@ AbstractCacheEntry::isLocked(int context) const
m_Address, m_locked, context);
return m_locked == context;
}
void
AbstractCacheEntry::setInHtmReadSet(bool val)
{
m_htmInReadSet = val;
}
void
AbstractCacheEntry::setInHtmWriteSet(bool val)
{
m_htmInWriteSet = val;
}
bool
AbstractCacheEntry::getInHtmReadSet() const
{
return m_htmInReadSet;
}
bool
AbstractCacheEntry::getInHtmWriteSet() const
{
return m_htmInWriteSet;
}

View File

@@ -1,4 +1,16 @@
/*
* Copyright (c) 2020 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and shall
* not be construed as granting a license to any other intellectual
* property including but not limited to intellectual property relating
* to a hardware implementation of the functionality of the software
* licensed hereunder. You may use the software subject to the license
* terms below provided that you ensure that this notice is replicated
* unmodified and in its entirety in all distributions of the software,
* modified or unmodified, in source code or in binary form.
*
* Copyright (c) 1999-2008 Mark D. Hill and David A. Wood
* All rights reserved.
*
@@ -90,6 +102,18 @@ class AbstractCacheEntry : public ReplaceableEntry
// Set the last access Tick.
void setLastAccess(Tick tick) { m_last_touch_tick = tick; }
// hardware transactional memory
void setInHtmReadSet(bool val);
void setInHtmWriteSet(bool val);
bool getInHtmReadSet() const;
bool getInHtmWriteSet() const;
virtual void invalidateEntry() {}
private:
// hardware transactional memory
bool m_htmInReadSet;
bool m_htmInWriteSet;
};
inline std::ostream&

View File

@@ -1,4 +1,16 @@
/*
* Copyright (c) 2020 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and shall
* not be construed as granting a license to any other intellectual
* property including but not limited to intellectual property relating
* to a hardware implementation of the functionality of the software
* licensed hereunder. You may use the software subject to the license
* terms below provided that you ensure that this notice is replicated
* unmodified and in its entirety in all distributions of the software,
* modified or unmodified, in source code or in binary form.
*
* Copyright (c) 2009 Mark D. Hill and David A. Wood
* All rights reserved.
*
@@ -57,6 +69,8 @@ class RubyRequest : public Message
DataBlock m_WTData;
int m_wfid;
uint64_t m_instSeqNum;
bool m_htmFromTransaction;
uint64_t m_htmTransactionUid;
RubyRequest(Tick curTime, uint64_t _paddr, uint8_t* _data, int _len,
uint64_t _pc, RubyRequestType _type, RubyAccessMode _access_mode,
@@ -71,7 +85,9 @@ class RubyRequest : public Message
m_Prefetch(_pb),
data(_data),
m_pkt(_pkt),
m_contextId(_core_id)
m_contextId(_core_id),
m_htmFromTransaction(false),
m_htmTransactionUid(0)
{
m_LineAddress = makeLineAddress(m_PhysicalAddress);
}
@@ -96,7 +112,9 @@ class RubyRequest : public Message
m_writeMask(_wm_size,_wm_mask),
m_WTData(_Data),
m_wfid(_proc_id),
m_instSeqNum(_instSeqNum)
m_instSeqNum(_instSeqNum),
m_htmFromTransaction(false),
m_htmTransactionUid(0)
{
m_LineAddress = makeLineAddress(m_PhysicalAddress);
}
@@ -122,7 +140,9 @@ class RubyRequest : public Message
m_writeMask(_wm_size,_wm_mask,_atomicOps),
m_WTData(_Data),
m_wfid(_proc_id),
m_instSeqNum(_instSeqNum)
m_instSeqNum(_instSeqNum),
m_htmFromTransaction(false),
m_htmTransactionUid(0)
{
m_LineAddress = makeLineAddress(m_PhysicalAddress);
}

View File

@@ -1,4 +1,16 @@
/*
* Copyright (c) 2020 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and shall
* not be construed as granting a license to any other intellectual
* property including but not limited to intellectual property relating
* to a hardware implementation of the functionality of the software
* licensed hereunder. You may use the software subject to the license
* terms below provided that you ensure that this notice is replicated
* unmodified and in its entirety in all distributions of the software,
* modified or unmodified, in source code or in binary form.
*
* Copyright (c) 1999-2008 Mark D. Hill and David A. Wood
* Copyright (c) 2013 Advanced Micro Devices, Inc.
* All rights reserved.
@@ -85,6 +97,75 @@ inline int max_tokens()
return 1024;
}
inline bool
isWriteRequest(RubyRequestType type)
{
if ((type == RubyRequestType_ST) ||
(type == RubyRequestType_ATOMIC) ||
(type == RubyRequestType_RMW_Read) ||
(type == RubyRequestType_RMW_Write) ||
(type == RubyRequestType_Store_Conditional) ||
(type == RubyRequestType_Locked_RMW_Read) ||
(type == RubyRequestType_Locked_RMW_Write) ||
(type == RubyRequestType_FLUSH)) {
return true;
} else {
return false;
}
}
inline bool
isDataReadRequest(RubyRequestType type)
{
if ((type == RubyRequestType_LD) ||
(type == RubyRequestType_Load_Linked)) {
return true;
} else {
return false;
}
}
inline bool
isReadRequest(RubyRequestType type)
{
if (isDataReadRequest(type) ||
(type == RubyRequestType_IFETCH)) {
return true;
} else {
return false;
}
}
inline bool
isHtmCmdRequest(RubyRequestType type)
{
if ((type == RubyRequestType_HTM_Start) ||
(type == RubyRequestType_HTM_Commit) ||
(type == RubyRequestType_HTM_Cancel) ||
(type == RubyRequestType_HTM_Abort)) {
return true;
} else {
return false;
}
}
inline RubyRequestType
htmCmdToRubyRequestType(const Packet *pkt)
{
if (pkt->req->isHTMStart()) {
return RubyRequestType_HTM_Start;
} else if (pkt->req->isHTMCommit()) {
return RubyRequestType_HTM_Commit;
} else if (pkt->req->isHTMCancel()) {
return RubyRequestType_HTM_Cancel;
} else if (pkt->req->isHTMAbort()) {
return RubyRequestType_HTM_Abort;
}
else {
panic("invalid ruby packet type\n");
}
}
/**
* This function accepts an address, a data block and a packet. If the address
* range for the data block contains the address which the packet needs to

View File

@@ -1,4 +1,16 @@
/*
* Copyright (c) 2020 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and shall
* not be construed as granting a license to any other intellectual
* property including but not limited to intellectual property relating
* to a hardware implementation of the functionality of the software
* licensed hereunder. You may use the software subject to the license
* terms below provided that you ensure that this notice is replicated
* unmodified and in its entirety in all distributions of the software,
* modified or unmodified, in source code or in binary form.
*
* Copyright (c) 1999-2012 Mark D. Hill and David A. Wood
* Copyright (c) 2013 Advanced Micro Devices, Inc.
* All rights reserved.
@@ -31,6 +43,7 @@
#include "base/intmath.hh"
#include "base/logging.hh"
#include "debug/HtmMem.hh"
#include "debug/RubyCache.hh"
#include "debug/RubyCacheTrace.hh"
#include "debug/RubyResourceStalls.hh"
@@ -479,6 +492,23 @@ CacheMemory::clearLocked(Addr address)
entry->clearLocked();
}
void
CacheMemory::clearLockedAll(int context)
{
// iterate through every set and way to get a cache line
for (auto i = m_cache.begin(); i != m_cache.end(); ++i) {
std::vector<AbstractCacheEntry*> set = *i;
for (auto j = set.begin(); j != set.end(); ++j) {
AbstractCacheEntry *line = *j;
if (line && line->isLocked(context)) {
DPRINTF(RubyCache, "Clear Lock for addr: %#x\n",
line->m_Address);
line->clearLocked();
}
}
}
}
bool
CacheMemory::isLocked(Addr address, int context)
{
@@ -578,6 +608,34 @@ CacheMemory::regStats()
.desc("number of stalls caused by data array")
.flags(Stats::nozero)
;
htmTransCommitReadSet
.init(8)
.name(name() + ".htm_transaction_committed_read_set")
.desc("read set size of a committed transaction")
.flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
;
htmTransCommitWriteSet
.init(8)
.name(name() + ".htm_transaction_committed_write_set")
.desc("write set size of a committed transaction")
.flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
;
htmTransAbortReadSet
.init(8)
.name(name() + ".htm_transaction_aborted_read_set")
.desc("read set size of a aborted transaction")
.flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
;
htmTransAbortWriteSet
.init(8)
.name(name() + ".htm_transaction_aborted_write_set")
.desc("write set size of a aborted transaction")
.flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
;
}
// assumption: SLICC generated files will only call this function
@@ -655,3 +713,69 @@ CacheMemory::isBlockNotBusy(int64_t cache_set, int64_t loc)
{
return (m_cache[cache_set][loc]->m_Permission != AccessPermission_Busy);
}
/* hardware transactional memory */
void
CacheMemory::htmAbortTransaction()
{
uint64_t htmReadSetSize = 0;
uint64_t htmWriteSetSize = 0;
// iterate through every set and way to get a cache line
for (auto i = m_cache.begin(); i != m_cache.end(); ++i)
{
std::vector<AbstractCacheEntry*> set = *i;
for (auto j = set.begin(); j != set.end(); ++j)
{
AbstractCacheEntry *line = *j;
if (line != nullptr) {
htmReadSetSize += (line->getInHtmReadSet() ? 1 : 0);
htmWriteSetSize += (line->getInHtmWriteSet() ? 1 : 0);
if (line->getInHtmWriteSet()) {
line->invalidateEntry();
}
line->setInHtmWriteSet(false);
line->setInHtmReadSet(false);
line->clearLocked();
}
}
}
htmTransAbortReadSet.sample(htmReadSetSize);
htmTransAbortWriteSet.sample(htmWriteSetSize);
DPRINTF(HtmMem, "htmAbortTransaction: read set=%u write set=%u\n",
htmReadSetSize, htmWriteSetSize);
}
void
CacheMemory::htmCommitTransaction()
{
uint64_t htmReadSetSize = 0;
uint64_t htmWriteSetSize = 0;
// iterate through every set and way to get a cache line
for (auto i = m_cache.begin(); i != m_cache.end(); ++i)
{
std::vector<AbstractCacheEntry*> set = *i;
for (auto j = set.begin(); j != set.end(); ++j)
{
AbstractCacheEntry *line = *j;
if (line != nullptr) {
htmReadSetSize += (line->getInHtmReadSet() ? 1 : 0);
htmWriteSetSize += (line->getInHtmWriteSet() ? 1 : 0);
line->setInHtmWriteSet(false);
line->setInHtmReadSet(false);
line->clearLocked();
}
}
}
htmTransCommitReadSet.sample(htmReadSetSize);
htmTransCommitWriteSet.sample(htmWriteSetSize);
DPRINTF(HtmMem, "htmCommitTransaction: read set=%u write set=%u\n",
htmReadSetSize, htmWriteSetSize);
}

View File

@@ -1,4 +1,16 @@
/*
* Copyright (c) 2020 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and shall
* not be construed as granting a license to any other intellectual
* property including but not limited to intellectual property relating
* to a hardware implementation of the functionality of the software
* licensed hereunder. You may use the software subject to the license
* terms below provided that you ensure that this notice is replicated
* unmodified and in its entirety in all distributions of the software,
* modified or unmodified, in source code or in binary form.
*
* Copyright (c) 1999-2012 Mark D. Hill and David A. Wood
* Copyright (c) 2013 Advanced Micro Devices, Inc.
* All rights reserved.
@@ -121,6 +133,7 @@ class CacheMemory : public SimObject
// provided by the AbstractCacheEntry class.
void setLocked (Addr addr, int context);
void clearLocked (Addr addr);
void clearLockedAll (int context);
bool isLocked (Addr addr, int context);
// Print cache contents
@@ -131,6 +144,10 @@ class CacheMemory : public SimObject
bool checkResourceAvailable(CacheResourceType res, Addr addr);
void recordRequestType(CacheRequestType requestType, Addr addr);
// hardware transactional memory
void htmAbortTransaction();
void htmCommitTransaction();
public:
Stats::Scalar m_demand_hits;
Stats::Scalar m_demand_misses;
@@ -150,6 +167,12 @@ class CacheMemory : public SimObject
Stats::Scalar numTagArrayStalls;
Stats::Scalar numDataArrayStalls;
// hardware transactional memory
Stats::Histogram htmTransCommitReadSet;
Stats::Histogram htmTransCommitWriteSet;
Stats::Histogram htmTransAbortReadSet;
Stats::Histogram htmTransAbortWriteSet;
int getCacheSize() const { return m_cache_size; }
int getCacheAssoc() const { return m_cache_assoc; }
int getNumBlocks() const { return m_cache_num_sets * m_cache_assoc; }

View File

@@ -0,0 +1,337 @@
/*
* Copyright (c) 2020 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and shall
* not be construed as granting a license to any other intellectual
* property including but not limited to intellectual property relating
* to a hardware implementation of the functionality of the software
* licensed hereunder. You may use the software subject to the license
* terms below provided that you ensure that this notice is replicated
* unmodified and in its entirety in all distributions of the software,
* modified or unmodified, in source code or in binary form.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met: redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer;
* redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution;
* neither the name of the copyright holders nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "mem/ruby/system/HTMSequencer.hh"
#include "debug/HtmMem.hh"
#include "debug/RubyPort.hh"
#include "mem/ruby/slicc_interface/RubySlicc_Util.hh"
#include "sim/system.hh"
using namespace std;
HtmCacheFailure
HTMSequencer::htmRetCodeConversion(
const HtmFailedInCacheReason ruby_ret_code)
{
switch (ruby_ret_code) {
case HtmFailedInCacheReason_NO_FAIL:
return HtmCacheFailure::NO_FAIL;
case HtmFailedInCacheReason_FAIL_SELF:
return HtmCacheFailure::FAIL_SELF;
case HtmFailedInCacheReason_FAIL_REMOTE:
return HtmCacheFailure::FAIL_REMOTE;
case HtmFailedInCacheReason_FAIL_OTHER:
return HtmCacheFailure::FAIL_OTHER;
default:
panic("Invalid htm return code\n");
}
}
HTMSequencer *
RubyHTMSequencerParams::create()
{
return new HTMSequencer(this);
}
HTMSequencer::HTMSequencer(const RubyHTMSequencerParams *p)
: Sequencer(p)
{
m_htmstart_tick = 0;
m_htmstart_instruction = 0;
}
HTMSequencer::~HTMSequencer()
{
}
void
HTMSequencer::htmCallback(Addr address,
const HtmCallbackMode mode,
const HtmFailedInCacheReason htm_return_code)
{
// mode=0: HTM command
// mode=1: transaction failed - inform via LD
// mode=2: transaction failed - inform via ST
if (mode == HtmCallbackMode_HTM_CMD) {
SequencerRequest* request = nullptr;
assert(m_htmCmdRequestTable.size() > 0);
request = m_htmCmdRequestTable.front();
m_htmCmdRequestTable.pop_front();
assert(isHtmCmdRequest(request->m_type));
PacketPtr pkt = request->pkt;
delete request;
// valid responses have zero as the payload
uint8_t* dataptr = pkt->getPtr<uint8_t>();
memset(dataptr, 0, pkt->getSize());
*dataptr = (uint8_t) htm_return_code;
// record stats
if (htm_return_code == HtmFailedInCacheReason_NO_FAIL) {
if (pkt->req->isHTMStart()) {
m_htmstart_tick = pkt->req->time();
m_htmstart_instruction = pkt->req->getInstCount();
DPRINTF(HtmMem, "htmStart - htmUid=%u\n",
pkt->getHtmTransactionUid());
} else if (pkt->req->isHTMCommit()) {
Tick transaction_ticks = pkt->req->time() - m_htmstart_tick;
Cycles transaction_cycles = ticksToCycles(transaction_ticks);
m_htm_transaction_cycles.sample(transaction_cycles);
m_htmstart_tick = 0;
Counter transaction_instructions =
pkt->req->getInstCount() - m_htmstart_instruction;
m_htm_transaction_instructions.sample(
transaction_instructions);
m_htmstart_instruction = 0;
DPRINTF(HtmMem, "htmCommit - htmUid=%u\n",
pkt->getHtmTransactionUid());
} else if (pkt->req->isHTMAbort()) {
HtmFailureFaultCause cause = pkt->req->getHtmAbortCause();
assert(cause != HtmFailureFaultCause::INVALID);
auto cause_idx = static_cast<int>(cause);
m_htm_transaction_abort_cause[cause_idx]++;
DPRINTF(HtmMem, "htmAbort - reason=%s - htmUid=%u\n",
htmFailureToStr(cause),
pkt->getHtmTransactionUid());
}
} else {
DPRINTF(HtmMem, "HTM_CMD: fail - htmUid=%u\n",
pkt->getHtmTransactionUid());
}
rubyHtmCallback(pkt, htm_return_code);
testDrainComplete();
} else if (mode == HtmCallbackMode_LD_FAIL ||
mode == HtmCallbackMode_ST_FAIL) {
// transaction failed
assert(address == makeLineAddress(address));
assert(m_RequestTable.find(address) != m_RequestTable.end());
auto &seq_req_list = m_RequestTable[address];
while (!seq_req_list.empty()) {
SequencerRequest &request = seq_req_list.front();
PacketPtr pkt = request.pkt;
markRemoved();
// TODO - atomics
// store conditionals should indicate failure
if (request.m_type == RubyRequestType_Store_Conditional) {
pkt->req->setExtraData(0);
}
DPRINTF(HtmMem, "%s_FAIL: size=%d - "
"addr=0x%lx - htmUid=%d\n",
(mode == HtmCallbackMode_LD_FAIL) ? "LD" : "ST",
pkt->getSize(),
address, pkt->getHtmTransactionUid());
rubyHtmCallback(pkt, htm_return_code);
testDrainComplete();
pkt = nullptr;
seq_req_list.pop_front();
}
// free all outstanding requests corresponding to this address
if (seq_req_list.empty()) {
m_RequestTable.erase(address);
}
} else {
panic("unrecognised HTM callback mode\n");
}
}
void
HTMSequencer::regStats()
{
Sequencer::regStats();
// hardware transactional memory
m_htm_transaction_cycles
.init(10)
.name(name() + ".htm_transaction_cycles")
.desc("number of cycles spent in an outer transaction")
.flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
;
m_htm_transaction_instructions
.init(10)
.name(name() + ".htm_transaction_instructions")
.desc("number of instructions spent in an outer transaction")
.flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
;
auto num_causes = static_cast<int>(HtmFailureFaultCause::NUM_CAUSES);
m_htm_transaction_abort_cause
.init(num_causes)
.name(name() + ".htm_transaction_abort_cause")
.desc("cause of htm transaction abort")
.flags(Stats::total | Stats::pdf | Stats::dist | Stats::nozero)
;
for (unsigned cause_idx = 0; cause_idx < num_causes; ++cause_idx) {
m_htm_transaction_abort_cause.subname(
cause_idx,
htmFailureToStr(HtmFailureFaultCause(cause_idx)));
}
}
void
HTMSequencer::rubyHtmCallback(PacketPtr pkt,
const HtmFailedInCacheReason htm_return_code)
{
// The packet was destined for memory and has not yet been turned
// into a response
assert(system->isMemAddr(pkt->getAddr()) || system->isDeviceMemAddr(pkt));
assert(pkt->isRequest());
// First retrieve the request port from the sender State
RubyPort::SenderState *senderState =
safe_cast<RubyPort::SenderState *>(pkt->popSenderState());
MemSlavePort *port = safe_cast<MemSlavePort*>(senderState->port);
assert(port != nullptr);
delete senderState;
//port->htmCallback(pkt, htm_return_code);
DPRINTF(HtmMem, "HTM callback: start=%d, commit=%d, "
"cancel=%d, rc=%d\n",
pkt->req->isHTMStart(), pkt->req->isHTMCommit(),
pkt->req->isHTMCancel(), htm_return_code);
// turn packet around to go back to requester if response expected
if (pkt->needsResponse()) {
DPRINTF(RubyPort, "Sending packet back over port\n");
pkt->makeHtmTransactionalReqResponse(
htmRetCodeConversion(htm_return_code));
port->schedTimingResp(pkt, curTick());
} else {
delete pkt;
}
trySendRetries();
}
void
HTMSequencer::wakeup()
{
Sequencer::wakeup();
// Check for deadlock of any of the requests
Cycles current_time = curCycle();
// hardware transactional memory commands
std::deque<SequencerRequest*>::iterator htm =
m_htmCmdRequestTable.begin();
std::deque<SequencerRequest*>::iterator htm_end =
m_htmCmdRequestTable.end();
for (; htm != htm_end; ++htm) {
SequencerRequest* request = *htm;
if (current_time - request->issue_time < m_deadlock_threshold)
continue;
panic("Possible Deadlock detected. Aborting!\n"
"version: %d m_htmCmdRequestTable: %d "
"current time: %u issue_time: %d difference: %d\n",
m_version, m_htmCmdRequestTable.size(),
current_time * clockPeriod(),
request->issue_time * clockPeriod(),
(current_time * clockPeriod()) -
(request->issue_time * clockPeriod()));
}
}
bool
HTMSequencer::empty() const
{
return Sequencer::empty() && m_htmCmdRequestTable.empty();
}
template <class VALUE>
std::ostream &
operator<<(ostream &out, const std::deque<VALUE> &queue)
{
auto i = queue.begin();
auto end = queue.end();
out << "[";
for (; i != end; ++i)
out << " " << *i;
out << " ]";
return out;
}
void
HTMSequencer::print(ostream& out) const
{
Sequencer::print(out);
out << "+ [HTMSequencer: " << m_version
<< ", htm cmd request table: " << m_htmCmdRequestTable
<< "]";
}
// Insert the request in the request table. Return RequestStatus_Aliased
// if the entry was already present.
RequestStatus
HTMSequencer::insertRequest(PacketPtr pkt, RubyRequestType primary_type,
RubyRequestType secondary_type)
{
if (isHtmCmdRequest(primary_type)) {
// for the moment, allow just one HTM cmd into the cache controller.
// Later this can be adjusted for optimization, e.g.
// back-to-back HTM_Starts.
if ((m_htmCmdRequestTable.size() > 0) && !pkt->req->isHTMAbort())
return RequestStatus_BufferFull;
// insert request into HtmCmd queue
SequencerRequest* htmReq =
new SequencerRequest(pkt, primary_type, secondary_type,
curCycle());
assert(htmReq);
m_htmCmdRequestTable.push_back(htmReq);
return RequestStatus_Ready;
} else {
return Sequencer::insertRequest(pkt, primary_type, secondary_type);
}
}

View File

@@ -0,0 +1,113 @@
/*
* Copyright (c) 2020 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and shall
* not be construed as granting a license to any other intellectual
* property including but not limited to intellectual property relating
* to a hardware implementation of the functionality of the software
* licensed hereunder. You may use the software subject to the license
* terms below provided that you ensure that this notice is replicated
* unmodified and in its entirety in all distributions of the software,
* modified or unmodified, in source code or in binary form.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met: redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer;
* redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution;
* neither the name of the copyright holders nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __MEM_RUBY_SYSTEM_HTMSEQUENCER_HH__
#define __MEM_RUBY_SYSTEM_HTMSEQUENCER_HH__
#include <cassert>
#include <iostream>
#include "mem/htm.hh"
#include "mem/ruby/protocol/HtmCallbackMode.hh"
#include "mem/ruby/protocol/HtmFailedInCacheReason.hh"
#include "mem/ruby/system/RubyPort.hh"
#include "mem/ruby/system/Sequencer.hh"
#include "params/RubyHTMSequencer.hh"
class HTMSequencer : public Sequencer
{
public:
HTMSequencer(const RubyHTMSequencerParams *p);
~HTMSequencer();
// callback to acknowledge HTM requests and
// notify cpu core when htm transaction fails in cache
void htmCallback(Addr,
const HtmCallbackMode,
const HtmFailedInCacheReason);
bool empty() const override;
void print(std::ostream& out) const override;
void regStats() override;
void wakeup() override;
private:
/**
* Htm return code conversion
*
* This helper is a hack meant to convert the autogenerated ruby
* enum (HtmFailedInCacheReason) to the manually defined one
* (HtmCacheFailure). This is needed since the cpu code would
* otherwise have to include the ruby generated headers in order
* to handle the htm return code.
*/
HtmCacheFailure htmRetCodeConversion(const HtmFailedInCacheReason rc);
void rubyHtmCallback(PacketPtr pkt, const HtmFailedInCacheReason fail_r);
RequestStatus insertRequest(PacketPtr pkt,
RubyRequestType primary_type,
RubyRequestType secondary_type) override;
// Private copy constructor and assignment operator
HTMSequencer(const HTMSequencer& obj);
HTMSequencer& operator=(const HTMSequencer& obj);
// table/queue for hardware transactional memory commands
// these do not have an address so a deque/queue is used instead.
std::deque<SequencerRequest*> m_htmCmdRequestTable;
Tick m_htmstart_tick;
Counter m_htmstart_instruction;
//! Histogram of cycle latencies of HTM transactions
Stats::Histogram m_htm_transaction_cycles;
//! Histogram of instruction lengths of HTM transactions
Stats::Histogram m_htm_transaction_instructions;
//! Causes for HTM transaction aborts
Stats::Vector m_htm_transaction_abort_cause;
};
inline std::ostream&
operator<<(std::ostream& out, const HTMSequencer& obj)
{
obj.print(out);
out << std::flush;
return out;
}
#endif // __MEM_RUBY_SYSTEM_HTMSEQUENCER_HH__

View File

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2012-2013,2019 ARM Limited
* Copyright (c) 2012-2013,2020 ARM Limited
* All rights reserved.
*
* The license below extends only to copyright in the software and shall
@@ -169,6 +169,7 @@ bool RubyPort::MemMasterPort::recvTimingResp(PacketPtr pkt)
{
// got a response from a device
assert(pkt->isResponse());
assert(!pkt->htmTransactionFailedInCache());
// First we must retrieve the request port from the sender State
RubyPort::SenderState *senderState =
@@ -253,6 +254,7 @@ RubyPort::MemSlavePort::recvTimingReq(PacketPtr pkt)
// pio port.
if (pkt->cmd != MemCmd::MemSyncReq) {
if (!isPhysMemAddress(pkt)) {
assert(!pkt->req->isHTMCmd());
assert(ruby_port->memMasterPort.isConnected());
DPRINTF(RubyPort, "Request address %#x assumed to be a "
"pio address\n", pkt->getAddr());
@@ -638,7 +640,6 @@ RubyPort::PioMasterPort::recvRangeChange()
}
}
int
RubyPort::functionalWrite(Packet *func_pkt)
{

View File

@@ -56,6 +56,7 @@ Source('CacheRecorder.cc')
Source('DMASequencer.cc')
if env['BUILD_GPU']:
Source('GPUCoalescer.cc')
Source('HTMSequencer.cc')
Source('RubyPort.cc')
Source('RubyPortProxy.cc')
Source('RubySystem.cc')

View File

@@ -55,6 +55,7 @@
#include "mem/ruby/protocol/PrefetchBit.hh"
#include "mem/ruby/protocol/RubyAccessMode.hh"
#include "mem/ruby/slicc_interface/RubyRequest.hh"
#include "mem/ruby/slicc_interface/RubySlicc_Util.hh"
#include "mem/ruby/system/RubySystem.hh"
#include "sim/system.hh"
@@ -148,6 +149,12 @@ Sequencer::llscCheckMonitor(const Addr address)
}
}
void
Sequencer::llscClearLocalMonitor()
{
m_dataCache_ptr->clearLockedAll(m_version);
}
void
Sequencer::wakeup()
{
@@ -243,7 +250,8 @@ Sequencer::insertRequest(PacketPtr pkt, RubyRequestType primary_type,
// Check if there is any outstanding request for the same cache line.
auto &seq_req_list = m_RequestTable[line_addr];
// Create a default entry
seq_req_list.emplace_back(pkt, primary_type, secondary_type, curCycle());
seq_req_list.emplace_back(pkt, primary_type,
secondary_type, curCycle());
m_outstanding_count++;
if (seq_req_list.size() > 1) {
@@ -569,7 +577,10 @@ Sequencer::empty() const
RequestStatus
Sequencer::makeRequest(PacketPtr pkt)
{
if (m_outstanding_count >= m_max_outstanding_requests) {
// HTM abort signals must be allowed to reach the Sequencer
// the same cycle they are issued. They cannot be retried.
if ((m_outstanding_count >= m_max_outstanding_requests) &&
!pkt->req->isHTMAbort()) {
return RequestStatus_BufferFull;
}
@@ -590,7 +601,7 @@ Sequencer::makeRequest(PacketPtr pkt)
if (pkt->isWrite()) {
DPRINTF(RubySequencer, "Issuing SC\n");
primary_type = RubyRequestType_Store_Conditional;
#ifdef PROTOCOL_MESI_Three_Level
#if defined (PROTOCOL_MESI_Three_Level) || defined (PROTOCOL_MESI_Three_Level_HTM)
secondary_type = RubyRequestType_Store_Conditional;
#else
secondary_type = RubyRequestType_ST;
@@ -629,7 +640,10 @@ Sequencer::makeRequest(PacketPtr pkt)
//
primary_type = secondary_type = RubyRequestType_ST;
} else if (pkt->isRead()) {
if (pkt->req->isInstFetch()) {
// hardware transactional memory commands
if (pkt->req->isHTMCmd()) {
primary_type = secondary_type = htmCmdToRubyRequestType(pkt);
} else if (pkt->req->isInstFetch()) {
primary_type = secondary_type = RubyRequestType_IFETCH;
} else {
bool storeCheck = false;
@@ -706,6 +720,14 @@ Sequencer::issueRequest(PacketPtr pkt, RubyRequestType secondary_type)
printAddress(msg->getPhysicalAddress()),
RubyRequestType_to_string(secondary_type));
// hardware transactional memory
// If the request originates in a transaction,
// then mark the Ruby message as such.
if (pkt->isHtmTransactional()) {
msg->m_htmFromTransaction = true;
msg->m_htmTransactionUid = pkt->getHtmTransactionUid();
}
Tick latency = cyclesToTicks(
m_controller->mandatoryQueueLatency(secondary_type));
assert(latency > 0);

View File

@@ -92,7 +92,7 @@ class Sequencer : public RubyPort
DataBlock& data);
// Public Methods
void wakeup(); // Used only for deadlock detection
virtual void wakeup(); // Used only for deadlock detection
void resetStats() override;
void collateStats();
void regStats() override;
@@ -114,7 +114,7 @@ class Sequencer : public RubyPort
const Cycles firstResponseTime = Cycles(0));
RequestStatus makeRequest(PacketPtr pkt) override;
bool empty() const;
virtual bool empty() const;
int outstandingCount() const override { return m_outstanding_count; }
bool isDeadlockEventScheduled() const override
@@ -123,7 +123,7 @@ class Sequencer : public RubyPort
void descheduleDeadlockEvent() override
{ deschedule(deadlockCheckEvent); }
void print(std::ostream& out) const;
virtual void print(std::ostream& out) const;
void markRemoved();
void evictionCallback(Addr address);
@@ -194,16 +194,22 @@ class Sequencer : public RubyPort
Cycles forwardRequestTime,
Cycles firstResponseTime);
RequestStatus insertRequest(PacketPtr pkt, RubyRequestType primary_type,
RubyRequestType secondary_type);
// Private copy constructor and assignment operator
Sequencer(const Sequencer& obj);
Sequencer& operator=(const Sequencer& obj);
protected:
// RequestTable contains both read and write requests, handles aliasing
std::unordered_map<Addr, std::list<SequencerRequest>> m_RequestTable;
Cycles m_deadlock_threshold;
virtual RequestStatus insertRequest(PacketPtr pkt,
RubyRequestType primary_type,
RubyRequestType secondary_type);
private:
int m_max_outstanding_requests;
Cycles m_deadlock_threshold;
CacheMemory* m_dataCache_ptr;
CacheMemory* m_instCache_ptr;
@@ -215,9 +221,6 @@ class Sequencer : public RubyPort
Cycles m_data_cache_hit_latency;
Cycles m_inst_cache_hit_latency;
// RequestTable contains both read and write requests, handles aliasing
std::unordered_map<Addr, std::list<SequencerRequest>> m_RequestTable;
// Global outstanding request count, across all request tables
int m_outstanding_count;
bool m_deadlock_check_scheduled;
@@ -294,6 +297,13 @@ class Sequencer : public RubyPort
* @return a boolean indicating if the line address was found.
*/
bool llscCheckMonitor(const Addr);
/**
* Removes all addresses from the local monitor.
* This is independent of this Sequencer object's version id.
*/
void llscClearLocalMonitor();
};
inline std::ostream&

View File

@@ -1,4 +1,5 @@
# Copyright (c) 2009 Advanced Micro Devices, Inc.
# Copyright (c) 2020 ARM Limited
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
@@ -70,6 +71,11 @@ class RubySequencer(RubyPort):
# 99 is the dummy default value
coreid = Param.Int(99, "CorePair core id")
class RubyHTMSequencer(RubySequencer):
type = 'RubyHTMSequencer'
cxx_class = 'HTMSequencer'
cxx_header = "mem/ruby/system/HTMSequencer.hh"
class DMASequencer(RubyPort):
type = 'DMASequencer'
cxx_header = "mem/ruby/system/DMASequencer.hh"

View File

@@ -1,4 +1,4 @@
# Copyright (c) 2019 ARM Limited
# Copyright (c) 2019-2020 ARM Limited
# All rights reserved.
#
# The license below extends only to copyright in the software and shall
@@ -54,6 +54,7 @@ python_class_map = {
"CacheMemory": "RubyCache",
"WireBuffer": "RubyWireBuffer",
"Sequencer": "RubySequencer",
"HTMSequencer": "RubyHTMSequencer",
"GPUCoalescer" : "RubyGPUCoalescer",
"VIPERCoalescer" : "VIPERCoalescer",
"DirectoryMemory": "RubyDirectoryMemory",