PIM overview
This commit is contained in:
@@ -9,88 +9,47 @@
|
||||
\label{sec:pim_workloads}
|
||||
|
||||
As already discussed in Section \ref{sec:introduction}, \ac{pim} is a good fit for accelerating memory-bound workloads.
|
||||
In contrast, compute-bound workloads tend to have high data reuse and can make excessive use of the on-chip cache, and therefore do not need to use the full memory bandwidth.
|
||||
For problems like this, \ac{pim} is of only limited use.
|
||||
In contrast, compute-bound workloads tend to have high data reuse and can make excessive use of the on-chip cache and therefore do not need to utilize the full memory bandwidth.
|
||||
For problems like this, \ac{pim} is only of limited use.
|
||||
|
||||
Many layers of modern \acp{dnn} can be expressed as a matrix-vector multiplication.
|
||||
The layer inputs can be represented as a vector and the model weights can be viewed as a matrix, where the number of columns is equal to the size of the input vector and the number of rows is equal to the size of the output vector.
|
||||
Pairwise multiplication of the input vector and a row of the matrix can be used to calculate an entry of the output vector.
|
||||
This process is illustrated for in Figure \ref{img:dnn}.
|
||||
This process is illustrated in Figure \ref{img:dnn} where one \ac{dnn} layer is processed.
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\begin{tikzpicture}
|
||||
\node[circle,thick,draw=red!60,fill=blue!20,minimum size=5mm,anchor=center] (inode0) at (0,0) {$i_0$};
|
||||
\node[circle,thick,draw=red!60,fill=blue!30,minimum size=5mm] (inode1) [below of=inode0] {$i_1$};
|
||||
\node[circle,thick,draw=red!60,fill=blue!40,minimum size=5mm] (inode2) [below of=inode1] {$i_2$};
|
||||
\node[circle,thick,draw=red!60,fill=blue!50,minimum size=5mm] (inode3) [below of=inode2] {$i_3$};
|
||||
|
||||
\node[circle,draw=black,fill=ForestGreen!20,minimum size=5mm,anchor=center] (onode0) at (2cm,0.5cm) {$o_0$};
|
||||
\node[circle,thick,draw=red!60,fill=ForestGreen!30,minimum size=5mm] (onode1) [below of=onode0] {$o_1$};
|
||||
\node[circle,draw=black,fill=ForestGreen!40,minimum size=5mm] (onode2) [below of=onode1] {$o_2$};
|
||||
\node[circle,draw=black,fill=ForestGreen!50,minimum size=5mm] (onode3) [below of=onode2] {$o_3$};
|
||||
\node[circle,draw=black,fill=ForestGreen!60,minimum size=5mm] (onode4) [below of=onode3] {$o_4$};
|
||||
|
||||
\draw (inode0.east) to (onode0.west);
|
||||
\draw (inode1.east) to (onode0.west);
|
||||
\draw (inode2.east) to (onode0.west);
|
||||
\draw (inode3.east) to (onode0.west);
|
||||
|
||||
\draw (inode0.east) to (onode2.west);
|
||||
\draw (inode1.east) to (onode2.west);
|
||||
\draw (inode2.east) to (onode2.west);
|
||||
\draw (inode3.east) to (onode2.west);
|
||||
|
||||
\draw (inode0.east) to (onode3.west);
|
||||
\draw (inode1.east) to (onode3.west);
|
||||
\draw (inode2.east) to (onode3.west);
|
||||
\draw (inode3.east) to (onode3.west);
|
||||
|
||||
\draw (inode0.east) to (onode4.west);
|
||||
\draw (inode1.east) to (onode4.west);
|
||||
\draw (inode2.east) to (onode4.west);
|
||||
\draw (inode3.east) to (onode4.west);
|
||||
|
||||
\draw[red!60,thick] (inode0.east) to (onode1.west);
|
||||
\draw[red!60,thick] (inode1.east) to (onode1.west);
|
||||
\draw[red!60,thick] (inode2.east) to (onode1.west);
|
||||
\draw[red!60,thick] (inode3.east) to (onode1.west);
|
||||
|
||||
\matrix (matrix) [matrix of nodes,left delimiter=(,right delimiter=),right of=onode2,node distance=4cm] {
|
||||
$w_{0,0}$ & $w_{0,1}$ & $w_{0,2}$ & $w_{0,3}$ \\
|
||||
$w_{1,0}$ & $w_{1,1}$ & $w_{1,2}$ & $w_{1,3}$ \\
|
||||
$w_{2,0}$ & $w_{2,1}$ & $w_{2,2}$ & $w_{2,3}$ \\
|
||||
$w_{3,0}$ & $w_{3,1}$ & $w_{3,2}$ & $w_{3,3}$ \\
|
||||
$w_{4,0}$ & $w_{4,1}$ & $w_{4,2}$ & $w_{4,3}$ \\
|
||||
};
|
||||
|
||||
\node[draw,thick,red!60,rounded corners,inner sep=0,fit=(matrix-2-1) (matrix-2-4)] {};
|
||||
|
||||
\node (prod) [right of=matrix,node distance=2.6cm] {$*$};
|
||||
|
||||
\matrix (vector) [matrix of nodes,left delimiter=(,right delimiter=),right of=prod] {
|
||||
$i_{0}$ \\
|
||||
$i_{1}$ \\
|
||||
$i_{2}$ \\
|
||||
$i_{3}$ \\
|
||||
};
|
||||
|
||||
\node (eq) [right of=vector,node distance=1.2cm] {$=$};
|
||||
\end{tikzpicture}
|
||||
\caption[]{\cite{he2020}}
|
||||
\input{images/dnn}
|
||||
\caption[A fully connected \ac{dnn} layer]{A fully connected \ac{dnn} layer \cite{he2020}.}
|
||||
\label{img:dnn}
|
||||
\end{figure}
|
||||
|
||||
Such an operation, defined in the widely used \ac{blas} library \cite{blas1979}, is also known as a \acs{gemv} routine.
|
||||
% hier matrixoperationen für dnns beschreiben
|
||||
% memory-boundness
|
||||
% BLAS kernel und so weiter...
|
||||
Because one matrix element is only used exactly once in the calculation the output vector, there is no data reuse of the matrix.
|
||||
Further, as the weight matrices tend to be too large to fit on the on-chip cache, such a \ac{gemv} operation is deeply memory-bound \cite{he2020}.
|
||||
As a result, such an opertion is a good fit for \ac{pim}.
|
||||
|
||||
\subsection{PIM Architectures}
|
||||
\label{sec:pim_architectures}
|
||||
|
||||
Many different \ac{pim} architectures have been proposed in the past by research and recently also real implementions by hardware vendors have been presented.
|
||||
These proposals differ largely in their positioning of the applied processing operation, reaching from analogue distribution of capacitor charges on the \ac{subarray}-level, to additional processing units on the global \ac{io} level.
|
||||
In essence, these placements of the approaches can be summarised as follows \cite{sudarshan2022}:
|
||||
|
||||
\begin{enumerate}
|
||||
\item Inside the memory \ac{subarray}.
|
||||
\item At the \ac{psa} region near a \ac{subarray}.
|
||||
\item Outside of the bank in its peripheral region.
|
||||
\item At the \ac{io} region of the memory.
|
||||
\end{enumerate}
|
||||
|
||||
Each of these approaches come with different advantages and disadvantages.
|
||||
In short, the nearer the processing happens to the memory \acs{subarray}, the higher is the achievable processing bandwidth.
|
||||
But also, the integration of the \ac{pim} units becomes more difficult as area and power constraints restrict the integration.
|
||||
% kurzer overview über die kategorien von PIM (paper vom lehrstuhl)
|
||||
|
||||
In the following, three \ac{pim} approaches are highlighted in more detail.
|
||||
|
||||
\subsection{UPMEM}
|
||||
\label{sec:pim_upmem}
|
||||
|
||||
|
||||
68
src/images/dnn.tex
Normal file
68
src/images/dnn.tex
Normal file
@@ -0,0 +1,68 @@
|
||||
\begin{tikzpicture}
|
||||
\node[circle,thick,draw=red!60,fill=blue!20,minimum size=5mm,anchor=center] (inode0) at (0,0) {$i_0$};
|
||||
\node[circle,thick,draw=red!60,fill=blue!30,minimum size=5mm] (inode1) [below of=inode0] {$i_1$};
|
||||
\node[circle,thick,draw=red!60,fill=blue!40,minimum size=5mm] (inode2) [below of=inode1] {$i_2$};
|
||||
\node[circle,thick,draw=red!60,fill=blue!50,minimum size=5mm] (inode3) [below of=inode2] {$i_3$};
|
||||
|
||||
\node[circle,draw=black,fill=ForestGreen!20,minimum size=5mm,anchor=center] (onode0) at (2cm,0.5cm) {$o_0$};
|
||||
\node[circle,thick,draw=red!60,fill=ForestGreen!30,minimum size=5mm] (onode1) [below of=onode0] {$o_1$};
|
||||
\node[circle,draw=black,fill=ForestGreen!40,minimum size=5mm] (onode2) [below of=onode1] {$o_2$};
|
||||
\node[circle,draw=black,fill=ForestGreen!50,minimum size=5mm] (onode3) [below of=onode2] {$o_3$};
|
||||
\node[circle,draw=black,fill=ForestGreen!60,minimum size=5mm] (onode4) [below of=onode3] {$o_4$};
|
||||
|
||||
\draw (inode0.east) to (onode0.west);
|
||||
\draw (inode1.east) to (onode0.west);
|
||||
\draw (inode2.east) to (onode0.west);
|
||||
\draw (inode3.east) to (onode0.west);
|
||||
|
||||
\draw (inode0.east) to (onode2.west);
|
||||
\draw (inode1.east) to (onode2.west);
|
||||
\draw (inode2.east) to (onode2.west);
|
||||
\draw (inode3.east) to (onode2.west);
|
||||
|
||||
\draw (inode0.east) to (onode3.west);
|
||||
\draw (inode1.east) to (onode3.west);
|
||||
\draw (inode2.east) to (onode3.west);
|
||||
\draw (inode3.east) to (onode3.west);
|
||||
|
||||
\draw (inode0.east) to (onode4.west);
|
||||
\draw (inode1.east) to (onode4.west);
|
||||
\draw (inode2.east) to (onode4.west);
|
||||
\draw (inode3.east) to (onode4.west);
|
||||
|
||||
\draw[red!60,thick] (inode0.east) to (onode1.west);
|
||||
\draw[red!60,thick] (inode1.east) to (onode1.west);
|
||||
\draw[red!60,thick] (inode2.east) to (onode1.west);
|
||||
\draw[red!60,thick] (inode3.east) to (onode1.west);
|
||||
|
||||
\matrix (matrix) [matrix of nodes,left delimiter=(,right delimiter=),right of=onode2,node distance=3.5cm] {
|
||||
$w_{0,0}$ & $w_{0,1}$ & $w_{0,2}$ & $w_{0,3}$ \\
|
||||
$w_{1,0}$ & $w_{1,1}$ & $w_{1,2}$ & $w_{1,3}$ \\
|
||||
$w_{2,0}$ & $w_{2,1}$ & $w_{2,2}$ & $w_{2,3}$ \\
|
||||
$w_{3,0}$ & $w_{3,1}$ & $w_{3,2}$ & $w_{3,3}$ \\
|
||||
$w_{4,0}$ & $w_{4,1}$ & $w_{4,2}$ & $w_{4,3}$ \\
|
||||
};
|
||||
|
||||
\node (prod) [right of=matrix,node distance=2.6cm] {$*$};
|
||||
|
||||
\matrix (input_vector) [matrix of nodes,left delimiter=(,right delimiter=),right of=prod] {
|
||||
$i_{0}$ \\
|
||||
$i_{1}$ \\
|
||||
$i_{2}$ \\
|
||||
$i_{3}$ \\
|
||||
};
|
||||
|
||||
\node (eq) [right of=input_vector,node distance=1.1cm] {$=$};
|
||||
|
||||
\matrix (output_vector) [matrix of nodes,left delimiter=(,right delimiter=),right of=eq,node distance=1.1cm] {
|
||||
$o_{0}$ \\
|
||||
$o_{1}$ \\
|
||||
$o_{2}$ \\
|
||||
$o_{3}$ \\
|
||||
$o_{4}$ \\
|
||||
};
|
||||
|
||||
\node[draw,thick,red!60,rounded corners,inner sep=0,fit=(matrix-2-1) (matrix-2-4)] {};
|
||||
\node[draw,thick,red!60,rounded corners,inner sep=0,fit=(input_vector-1-1) (input_vector-4-1)] {};
|
||||
\node[draw,thick,red!60,rounded corners,inner sep=0,fit=(output_vector-2-1) (output_vector-2-1)] {};
|
||||
\end{tikzpicture}
|
||||
@@ -24,6 +24,7 @@
|
||||
|
||||
% Configurations
|
||||
\usetikzlibrary{matrix}
|
||||
\usetikzlibrary{automata}
|
||||
\usetikzlibrary{fit}
|
||||
\setlength\textheight{24cm}
|
||||
\setkomafont{paragraph}{\footnotesize}
|
||||
|
||||
Reference in New Issue
Block a user