Add wallclock-time plots

This commit is contained in:
2024-03-25 16:37:22 +01:00
parent 19bb7513af
commit 407848ada7
5 changed files with 56 additions and 2 deletions

View File

@@ -8,7 +8,7 @@
ymin=0,
ymax=20,
ymajorgrids,
ylabel={Relative Performance},
ylabel={Speedup},
tick pos=left,
xtick=data,
xticklabels from table={\csv}{level},

View File

@@ -8,7 +8,7 @@
ymin=0,
ymax=20,
ymajorgrids,
ylabel={Relative Performance},
ylabel={Speedup},
tick pos=left,
xtick=data,
xticklabels from table={\csv}{level},

3
plots/wallclock_time.csv Normal file
View File

@@ -0,0 +1,3 @@
system,haxpy,dnn,gemv,vmul,vadd
PIM-HBM,135.58,49.17,256.96,128.43,136.81
HBM,3084.85,941.57,6788.0,1701.99,2136.37
1 system haxpy dnn gemv vmul vadd
2 PIM-HBM 135.58 49.17 256.96 128.43 136.81
3 HBM 3084.85 941.57 6788.0 1701.99 2136.37

40
plots/wallclock_time.tex Normal file
View File

@@ -0,0 +1,40 @@
\begin{tikzpicture}
\pgfplotstableread[col sep=comma]{plots/wallclock_time.csv}\csv
\begin{axis}[
width=10cm,
height=4cm,
ybar=1pt,
bar width = 5pt,
ymin=0,
ymax=1e4,
ymode=log,
ymajorgrids,
ylabel={Runtime [s]},
tick pos=left,
xtick=data,
xticklabels from table={\csv}{system},
enlarge x limits=0.5,
legend style={
at={(current bounding box.south-|current axis.south)},
anchor=north,
legend columns=-1,
draw=none,
/tikz/every even column/.append style={column sep=0.5cm}
},
]
\addplot[fill=_darkblue!90] table [x expr=\coordindex, y={vadd}]{\csv};
\addlegendentry{VADD}
\addplot[fill=_blue!90] table [x expr=\coordindex, y={vmul}]{\csv};
\addlegendentry{VMUL}
\addplot[fill=_green!90] table [x expr=\coordindex, y={haxpy}]{\csv};
\addlegendentry{HAXPY}
\addplot[fill=_orange!90] table [x expr=\coordindex, y={gemv}]{\csv};
\addlegendentry{GEMV}
\addplot[fill=yellow!90] table [x expr=\coordindex, y={dnn}]{\csv};
\addlegendentry{DNN}
\end{axis}
\end{tikzpicture}

View File

@@ -351,6 +351,17 @@ However, this memory barrier has also been implemented in our VADD kernel, which
The \ac{gemv} microbenchmark on the other hand shows a more matching result with an average speedup value of $\qty{8.3}{\times}$ for Samsung's real system and \qty{2.6}{\times} for their virtual prototype, while this paper achieved an average speedup of $\qty{9.0}{\times}$, which is well within the reach of the real hardware implementation.
\begin{figure}
\centering
\input{plots/wallclock_time}
\caption{Runtimes of the simulation workloads on the host system.}
\label{fig:wallclock_time}
\end{figure}
\Cref{fig:wallclock_time} shows the simulation runtimes of the various workloads on the host system.
With \ac{pim} enabled, the runtime drops by about an order of magnitude for some workloads, indicating the reduced simulation effort on gem5's complex processor model, as only new memory requests are issued by the model during operation of \ac{pim}.
Therefore, exploring the effectiveness of different \ac{pim}-enabled workloads may be less time-consuming due to the reduced simulation complexity.
\section{Conclusion}
% TODO Lukas/Matthias
%