diff --git a/README.md b/README.md
index c2b5e19..fb12539 100644
--- a/README.md
+++ b/README.md
@@ -88,9 +88,9 @@ To run the benchmark call:
 The benchmark will output the results similar to the stream benchmark. Results are validated.
 For threaded execution it is recommended to control thread affinity.
 
-We recommend to use likwid-pin for benchmarking:
+We recommend to use likwid-pin for setting the number of threads used and to control thread affinity:
 ```
-likwid-pin -c 0-3 ./bwbench-GCC
+likwid-pin -C 0-3 ./bwbench-GCC
 ```
 
 Example output for threaded execution:
@@ -118,3 +118,42 @@ SDaxpy:        46822.63    23411.32      0.0281       0.0273       0.0325
 Solution Validates
 ```
 
+## Scaling runs
+
+Apart from the highest sustained memory bandwidth often also the scaling behavior within memory domains is a important system property.
+
+There is a helper script included in util (```extractResults.pl```) that creates a text result file from multiple runs that can be used as input to plotting applications as gnuplot and xmgrace.
+This involves two steps: Executing the benchmark runs and then creating the data file.
+
+To run the benchmark for different thread counts within a memory domain execute (this assumes bash or zsh):
+```
+$ for nt in 1 2 4 6 8 10; do likwid-pin -q -C E:M0:$nt:1:2 ./bwbench-ICC > dat/emmy-$nt.txt; done
+```
+
+It is recommended to just use one thread per core in case the processor support hyperthreading.
+Use whatever stepping you like, here a stepping of two was used.
+The ```-q``` option suppresses output from ```likwid-pin```.
+Above line uses the expression based syntax, on systems with hyperthreading enabled (check with, e.g., ```likwid-topology```) you have to skip the other hardware threads on each core.
+For above system with 2 hardware threads per core this results in ```-C E:M0:$nt:1:2```, on a system with 4 hardware threads per core you would need ```-C E:M0:$nt:1:4```.
+The string before the dash (here emmy) can be arbitrary, but the after the dash the extraction script expects the thread count.
+Also the file ending has to be ```.txt```.
+Please check with a text editor on some result files if everything worked fine.
+
+To extract the results and output in a plotable format execute:
+```
+./extractResults.pl ./dat
+```
+
+The script will pick up all result files in the directory specified and create a column format output file.
+In this case:
+```
+#nt	Init	Sum	Copy	Update	Triad	Daxpy	STriad	SDaxpy
+1	4109	11900	5637	8025	7407	9874	8981	11288
+2	8057	22696	11011	15174	14821	18786	17599	21475
+4	15602	39327	21020	28197	27287	33633	31939	37146
+6	22592	45877	29618	37155	36664	40259	39911	41546
+8	28641	46878	35763	40111	40106	41293	41022	41950
+10	33151	46741	38187	40269	39960	40922	40567	41606
+```
+
+Please be aware the the single core memory bandwidth as well as the scaling behavior depends on the frequency settings.