tests,configs,mem-ruby: Adding Ruby tester for GPU_VIPER

This patch adds the GPU protocol tester that uses data-race-free operation to discover bugs in GPU protocols including GPU_VIPER. For more information please see the following paper and the README: T. Ta, X. Zhang, A. Gutierrez and B. M. Beckmann, "Autonomous Data-Race-Free GPU Testing," 2019 IEEE International Symposium on Workload Characterization (IISWC), Orlando, FL, USA, 2019, pp. 81-92, doi: 10.1109/IISWC47752.2019.9042019. Change-Id: Ic9939d131a930d1e7014ed0290601140bdd1499f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32855 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-24 14:53:13 -05:00
parent 1a2b677728
commit f36817c367
19 changed files with 3498 additions and 103 deletions
--- a/src/cpu/testers/gpu_ruby_test/README
+++ b/src/cpu/testers/gpu_ruby_test/README
@@ -0,0 +1,129 @@
+/*
+ * Copyright (c) 2017-2020 Advanced Micro Devices, Inc.
+ * All rights reserved.
+ *
+ * For use for simulation and test purposes only
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived from this
+ * software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+This directory contains a tester for gem5 GPU protocols. Unlike the Ruby random
+teter, this tester does not rely on sequential consistency. Instead, it
+assumes tested protocols supports release consistency.
+
+----- Getting Started -----
+
+To start using the tester quickly, you can use the following example command
+line to get running immediately:
+
+build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py \
+            --test-length=1000 --system-size=medium --cache-size=small
+
+An overview of the main command line options is as follows. For all options
+use `build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py --help`
+or see the configuration file.
+
+ * --cache-size (small, large): Use smaller sizes for testing evict, etc.
+ * --system-size (small, medium, large): Effectively the number of threads in
+                 the GPU model. Large size will have more contention. Larger
+                 sizes are useful for checking contention.
+ * --episode-length (short, medium, long): Number of loads and stores in an
+                 episode. Episodes will also have atomics mixed in. See below
+                 for a definition of episode.
+ * --test-length (int): Number of episodes to execute. This will determine the
+                 amount of time the tester runs for. Longer time will stress
+                 the protocol harder.
+
+The remainder of this file describes the theory behind the tester design and
+a link to a more detailed research paper is provided at the end.
+
+----- Theory Overview -----
+
+The GPU Ruby tester creates a system consisting of both CPU threads and GPU
+wavefronts. CPU threads are scalar, so there is one lane per CPU thread. GPU
+wavefront may have multiple lanes. The number of lanes is initialized when
+a thread/wavefront is created.
+
+Each thread/wavefront executes a number of episodes. Each episode is a series
+of memory actions (i.e., atomic, load, store, acquire and release). In a
+wavefront, all lanes execute the same sequence of actions, but they may target
+different addresses. One can think of an episode as a critical section which
+is bounded by a lock acquire in the beginning and a lock release at the end. An
+episode consists of actions in the following order:
+
+1 - Atomic action
+2 - Acquire action
+3 - A number of load and store actions
+4 - Release action
+5 - Atomic action that targets the same address as (1) does
+
+There are two separate set of addresses: atomic and non-atomic. Atomic actions
+target only atomic addresses. Load and store actions target only non-atomic
+addresses. Memory addresses are all 4-byte aligned in the tester.
+
+To test false sharing cases in which both atomic and non-atomic addresses are
+placed in the same cache line, we abstract out the concept of memory addresses
+from the tester's perspective by introducing the concept of location. Locations
+are numbered from 0 to N-1 (if there are N addresses). The first X locations
+[0..X-1] are atomic locations, and the rest are non-atomic locations.
+The 1-1 mapping between locations and addresses are randomly created when the
+tester is initialized.
+
+Per load and store action, its target location is selected so that there is no
+data race in the generated stream of memory requests at any time during the
+test. Since in Data-Race-Free model, the memory system's behavior is undefined
+in data race cases, we exclude data race scenarios from our protocol test.
+
+Once location per load/store action is determined, each thread/wavefront either
+loads current value at the location or stores an incremental value to that
+location. The tester maintains a table tracking all last writers and their
+written values, so we know what value should be returned from a load and what
+value should be written next at a particular location. Value returned from a
+load must match with the value written by the last writer.
+
+----- Directory Structure -----
+
+ProtocolTester.hh/cc -- This is the main tester class that orchestrates the
+                        entire test.
+AddressManager.hh/cc -- This manages address space, randomly maps address to
+                        location, generates locations for all episodes,
+                        maintains per-location last writer and validates
+                        values returned from load actions.
+GpuThread.hh/cc         -- This is abstract class for CPU threads and GPU
+                        wavefronts. It generates and executes a series of
+                        episodes.
+CpuThread.hh/cc      -- Thread class for CPU threads. Not fully implemented yet
+GpuWavefront.hh/cc   -- GpuThread class for GPU wavefronts.
+Episode.hh/cc        -- Class to encapsulate an episode, notably including
+                        episode load/store structure and ordering.
+
+For more detail, please see the following paper:
+
+T. Ta, X. Zhang, A. Gutierrez and B. M. Beckmann, "Autonomous Data-Race-Free
+GPU Testing," 2019 IEEE International Symposium on Workload Characterization
+(IISWC), Orlando, FL, USA, 2019, pp. 81-92, doi:
+10.1109/IISWC47752.2019.9042019.