/* * Copyright (c) 2017-2021 Advanced Micro Devices, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * 1. Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright notice, * this list of conditions and the following disclaimer in the documentation * and/or other materials provided with the distribution. * * 3. Neither the name of the copyright holder nor the names of its * contributors may be used to endorse or promote products derived from this * software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ This directory contains a tester for gem5 GPU protocols. Unlike the Ruby random teter, this tester does not rely on sequential consistency. Instead, it assumes tested protocols supports release consistency. ----- Getting Started ----- To start using the tester quickly, you can use the following example command line to get running immediately: build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py \ --test-length=1000 --system-size=medium --cache-size=small An overview of the main command line options is as follows. For all options use `build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py --help` or see the configuration file. * --cache-size (small, large): Use smaller sizes for testing evict, etc. * --system-size (small, medium, large): Effectively the number of threads in the GPU model. Large size will have more contention. Larger sizes are useful for checking contention. * --episode-length (short, medium, long): Number of loads and stores in an episode. Episodes will also have atomics mixed in. See below for a definition of episode. * --test-length (int): Number of episodes to execute. This will determine the amount of time the tester runs for. Longer time will stress the protocol harder. The remainder of this file describes the theory behind the tester design and a link to a more detailed research paper is provided at the end. ----- Theory Overview ----- The GPU Ruby tester creates a system consisting of both CPU threads and GPU wavefronts. CPU threads are scalar, so there is one lane per CPU thread. GPU wavefront may have multiple lanes. The number of lanes is initialized when a thread/wavefront is created. Each thread/wavefront executes a number of episodes. Each episode is a series of memory actions (i.e., atomic, load, store, acquire and release). In a wavefront, all lanes execute the same sequence of actions, but they may target different addresses. One can think of an episode as a critical section which is bounded by a lock acquire in the beginning and a lock release at the end. An episode consists of actions in the following order: 1 - Atomic action 2 - Acquire action 3 - A number of load and store actions 4 - Release action 5 - Atomic action that targets the same address as (1) does There are two separate set of addresses: atomic and non-atomic. Atomic actions target only atomic addresses. Load and store actions target only non-atomic addresses. Memory addresses are all 4-byte aligned in the tester. To test false sharing cases in which both atomic and non-atomic addresses are placed in the same cache line, we abstract out the concept of memory addresses from the tester's perspective by introducing the concept of location. Locations are numbered from 0 to N-1 (if there are N addresses). The first X locations [0..X-1] are atomic locations, and the rest are non-atomic locations. The 1-1 mapping between locations and addresses are randomly created when the tester is initialized. Per load and store action, its target location is selected so that there is no data race in the generated stream of memory requests at any time during the test. Since in Data-Race-Free model, the memory system's behavior is undefined in data race cases, we exclude data race scenarios from our protocol test. Once location per load/store action is determined, each thread/wavefront either loads current value at the location or stores an incremental value to that location. The tester maintains a table tracking all last writers and their written values, so we know what value should be returned from a load and what value should be written next at a particular location. Value returned from a load must match with the value written by the last writer. ----- Directory Structure ----- ProtocolTester.hh/cc -- This is the main tester class that orchestrates the entire test. AddressManager.hh/cc -- This manages address space, randomly maps address to location, generates locations for all episodes, maintains per-location last writer and validates values returned from load actions. TesterThread.hh/cc -- This is abstract class for CPU threads and GPU wavefronts. It generates and executes a series of episodes. CpuThread.hh/cc -- Thread class for CPU threads. Not fully implemented yet GpuWavefront.hh/cc -- Thread class for GPU wavefronts. Episode.hh/cc -- Class to encapsulate an episode, notably including episode load/store structure and ordering. For more detail, please see the following paper: T. Ta, X. Zhang, A. Gutierrez and B. M. Beckmann, "Autonomous Data-Race-Free GPU Testing," 2019 IEEE International Symposium on Workload Characterization (IISWC), Orlando, FL, USA, 2019, pp. 81-92, doi: 10.1109/IISWC47752.2019.9042019.