The following command was run: ``` pre-commit run --all-files ``` This ensures all the files in the repository are formatted to pass our checks. Change-Id: Ia2fe3529a50ad925d1076a612d60a4280adc40de Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62572 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
128 lines
6.5 KiB
Plaintext
128 lines
6.5 KiB
Plaintext
/*
|
|
* Copyright (c) 2017-2021 Advanced Micro Devices, Inc.
|
|
* All rights reserved.
|
|
*
|
|
* Redistribution and use in source and binary forms, with or without
|
|
* modification, are permitted provided that the following conditions are met:
|
|
*
|
|
* 1. Redistributions of source code must retain the above copyright notice,
|
|
* this list of conditions and the following disclaimer.
|
|
*
|
|
* 2. Redistributions in binary form must reproduce the above copyright notice,
|
|
* this list of conditions and the following disclaimer in the documentation
|
|
* and/or other materials provided with the distribution.
|
|
*
|
|
* 3. Neither the name of the copyright holder nor the names of its
|
|
* contributors may be used to endorse or promote products derived from this
|
|
* software without specific prior written permission.
|
|
*
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
|
|
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
|
*/
|
|
|
|
This directory contains a tester for gem5 GPU protocols. Unlike the Ruby random
|
|
teter, this tester does not rely on sequential consistency. Instead, it
|
|
assumes tested protocols supports release consistency.
|
|
|
|
----- Getting Started -----
|
|
|
|
To start using the tester quickly, you can use the following example command
|
|
line to get running immediately:
|
|
|
|
build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py \
|
|
--test-length=1000 --system-size=medium --cache-size=small
|
|
|
|
An overview of the main command line options is as follows. For all options
|
|
use `build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py --help`
|
|
or see the configuration file.
|
|
|
|
* --cache-size (small, large): Use smaller sizes for testing evict, etc.
|
|
* --system-size (small, medium, large): Effectively the number of threads in
|
|
the GPU model. Large size will have more contention. Larger
|
|
sizes are useful for checking contention.
|
|
* --episode-length (short, medium, long): Number of loads and stores in an
|
|
episode. Episodes will also have atomics mixed in. See below
|
|
for a definition of episode.
|
|
* --test-length (int): Number of episodes to execute. This will determine the
|
|
amount of time the tester runs for. Longer time will stress
|
|
the protocol harder.
|
|
|
|
The remainder of this file describes the theory behind the tester design and
|
|
a link to a more detailed research paper is provided at the end.
|
|
|
|
----- Theory Overview -----
|
|
|
|
The GPU Ruby tester creates a system consisting of both CPU threads and GPU
|
|
wavefronts. CPU threads are scalar, so there is one lane per CPU thread. GPU
|
|
wavefront may have multiple lanes. The number of lanes is initialized when
|
|
a thread/wavefront is created.
|
|
|
|
Each thread/wavefront executes a number of episodes. Each episode is a series
|
|
of memory actions (i.e., atomic, load, store, acquire and release). In a
|
|
wavefront, all lanes execute the same sequence of actions, but they may target
|
|
different addresses. One can think of an episode as a critical section which
|
|
is bounded by a lock acquire in the beginning and a lock release at the end. An
|
|
episode consists of actions in the following order:
|
|
|
|
1 - Atomic action
|
|
2 - Acquire action
|
|
3 - A number of load and store actions
|
|
4 - Release action
|
|
5 - Atomic action that targets the same address as (1) does
|
|
|
|
There are two separate set of addresses: atomic and non-atomic. Atomic actions
|
|
target only atomic addresses. Load and store actions target only non-atomic
|
|
addresses. Memory addresses are all 4-byte aligned in the tester.
|
|
|
|
To test false sharing cases in which both atomic and non-atomic addresses are
|
|
placed in the same cache line, we abstract out the concept of memory addresses
|
|
from the tester's perspective by introducing the concept of location. Locations
|
|
are numbered from 0 to N-1 (if there are N addresses). The first X locations
|
|
[0..X-1] are atomic locations, and the rest are non-atomic locations.
|
|
The 1-1 mapping between locations and addresses are randomly created when the
|
|
tester is initialized.
|
|
|
|
Per load and store action, its target location is selected so that there is no
|
|
data race in the generated stream of memory requests at any time during the
|
|
test. Since in Data-Race-Free model, the memory system's behavior is undefined
|
|
in data race cases, we exclude data race scenarios from our protocol test.
|
|
|
|
Once location per load/store action is determined, each thread/wavefront either
|
|
loads current value at the location or stores an incremental value to that
|
|
location. The tester maintains a table tracking all last writers and their
|
|
written values, so we know what value should be returned from a load and what
|
|
value should be written next at a particular location. Value returned from a
|
|
load must match with the value written by the last writer.
|
|
|
|
----- Directory Structure -----
|
|
|
|
ProtocolTester.hh/cc -- This is the main tester class that orchestrates the
|
|
entire test.
|
|
AddressManager.hh/cc -- This manages address space, randomly maps address to
|
|
location, generates locations for all episodes,
|
|
maintains per-location last writer and validates
|
|
values returned from load actions.
|
|
TesterThread.hh/cc -- This is abstract class for CPU threads and GPU
|
|
wavefronts. It generates and executes a series of
|
|
episodes.
|
|
CpuThread.hh/cc -- Thread class for CPU threads. Not fully implemented yet
|
|
GpuWavefront.hh/cc -- Thread class for GPU wavefronts.
|
|
Episode.hh/cc -- Class to encapsulate an episode, notably including
|
|
episode load/store structure and ordering.
|
|
|
|
For more detail, please see the following paper:
|
|
|
|
T. Ta, X. Zhang, A. Gutierrez and B. M. Beckmann, "Autonomous Data-Race-Free
|
|
GPU Testing," 2019 IEEE International Symposium on Workload Characterization
|
|
(IISWC), Orlando, FL, USA, 2019, pp. 81-92, doi:
|
|
10.1109/IISWC47752.2019.9042019.
|