Update README.md

2019-07-01 12:00:23 +02:00
parent d8b4feb50a
commit 1d2063065f
1 changed files with 2 additions and 112 deletions
--- a/README.md
+++ b/README.md
@@ -1,117 +1,7 @@
 # The Bandwidth Benchmark

 This is a collection of simple streaming kernels for teaching purposes.
-It is heavily inspired by John McCalpin's https://www.cs.virginia.edu/stream/ benchmark.

-It contains the following streaming kernels with corresponding data access pattern (Notation: S - store, L - load, WA - write allocate). All variables are vectors, s is a scalar:
+It consists of two banchmark applications:

-* init (S1, WA): Initilize an array: `a = s`. Store only.
-* sum (L1): Vector reduction: `s += a`. Load only.
-* copy  (L1, S1, WA): Classic memcopy: `a = b`.
-* update (L1, S1): Update vector: `a = a * scalar`. Also load + store but without write allocate.
-* triad (L2, S1, WA): Stream triad: `a = b + c * scalar`.
-* daxpy (L2, S1): Daxpy: `a = a + b * scalar`.
-* striad (L3, S1, WA): Schoenauer triad: `a = b + c * d`.
-* sdaxpy (L3, S1): Schoenauer triad without write allocate: `a = a + b * c`.
-
-As added benefit the code is a blueprint for a minimal benchmarking application with a generic makefile and modules for aligned array allocation, accurate timing and affinity settings. Those components can be used standalone in your own project.
-
-## Build
-
-1. Configure the toolchain and additional options in `config.mk`:
-```
-# Supported: GCC, CLANG, ICC
-TAG ?= GCC
-ENABLE_OPENMP ?= false
-
-OPTIONS  =  -DSIZE=40000000ull
-OPTIONS +=  -DNTIMES=10
-OPTIONS +=  -DARRAY_ALIGNMENT=64
-#OPTIONS +=  -DVERBOSE_AFFINITY
-#OPTIONS +=  -DVERBOSE_DATASIZE
-#OPTIONS +=  -DVERBOSE_TIMER
-```
-
-The verbosity options enable detailed output about affinity settings, allocation sizes and timer resolution.
-
-2. Build with:
-```
-make
-```
-
-You can build multiple toolchains in the same directory, but notice that the Makefile is only acting on the one currently set. Intermediate build results are located in the `<TOOLCHAIN>` directory.
-
-To output the executed commands use:
-```
-make Q=
-```
-
-3. Clean up with:
-```
-make clean
-```
-to clean intermediate build results.
-
-```
-make distclean
-```
-to clean intermediate build results and binary.
-
-4. (Optional) Generate assembler:
-```
-make asm
-```
-The assembler files will also be located in the `<TOOLCHAIN>` directory.
-
-## Usage
-
-To run the benchmark call:
-```
-./bwBench-<TOOLCHAIN>
-```
-
-The benchmark will output the results similar to the stream benchmark. Results are validated.
-For threaded execution it is recommended to control thread affinity.
-
-We recommend to use likwid-pin for benchmarking:
-```
-likwid-pin -c 0-3 ./bwbench-GCC  
-```
-
-Example output for threaded execution:
-```
-------------------------------------------------------------
-[pthread wrapper] 
-[pthread wrapper] MAIN -> 0
-[pthread wrapper] PIN_MASK: 0->1  1->2  2->3  
-[pthread wrapper] SKIP MASK: 0x0
-        threadid 140271463495424 -> core 1 - OK
-        threadid 140271455102720 -> core 2 - OK
-        threadid 140271446710016 -> core 3 - OK
-OpenMP enabled, running with 4 threads
----------------------------------------------------------------------------
-Function      Rate(MB/s)  Rate(MFlop/s)  Avg time     Min time     Max time
-Init:          22111.53    -             0.0148       0.0145       0.0165
-Sum:           46808.59    46808.59      0.0077       0.0068       0.0140
-Copy:          30983.06    -             0.0207       0.0207       0.0208
-Update:        43778.69    21889.34      0.0147       0.0146       0.0148
-Triad:         34476.64    22984.43      0.0282       0.0278       0.0305
-Daxpy:         45908.82    30605.88      0.0214       0.0209       0.0242
-STriad:        37502.37    18751.18      0.0349       0.0341       0.0388
-SDaxpy:        46822.63    23411.32      0.0281       0.0273       0.0325
----------------------------------------------------------------------------
-Solution Validates
-```
-
-A perl wrapper script (bench.pl) is also provided to scan ranges of thread counts and determine the absolute highest sustained main memory bandwidth. In order to use it `likwid-pin` has to be in your path. The script has three required and one optional command line arguments:
-```
-$./bench.pl <executable> <thread count range>  <repititions> [<SMT setting>] 
-```
-Example usage:
-```
-$./bench.pl ./bwbench-GCC 2-8 6
-```
-The script will always use physical cores only, where two SMT threads is the default. For different SMT thread counts use the 4th command line argument. Example for a processor without SMT:
-```
-$./bench.pl ./bwbench-GCC 14-24  10  1
-```
+* [[MainMemory|https://github.com/RRZE-HPC/TheBandwidthBenchmark/wiki/MainMemory]]