PIM overview

This commit is contained in:
2024-02-04 22:54:22 +01:00
parent e2cbec5644
commit 9bf055ba97
3 changed files with 93 additions and 65 deletions

View File

@@ -9,88 +9,47 @@
\label{sec:pim_workloads}
As already discussed in Section \ref{sec:introduction}, \ac{pim} is a good fit for accelerating memory-bound workloads.
In contrast, compute-bound workloads tend to have high data reuse and can make excessive use of the on-chip cache, and therefore do not need to use the full memory bandwidth.
For problems like this, \ac{pim} is of only limited use.
In contrast, compute-bound workloads tend to have high data reuse and can make excessive use of the on-chip cache and therefore do not need to utilize the full memory bandwidth.
For problems like this, \ac{pim} is only of limited use.
Many layers of modern \acp{dnn} can be expressed as a matrix-vector multiplication.
The layer inputs can be represented as a vector and the model weights can be viewed as a matrix, where the number of columns is equal to the size of the input vector and the number of rows is equal to the size of the output vector.
Pairwise multiplication of the input vector and a row of the matrix can be used to calculate an entry of the output vector.
This process is illustrated for in Figure \ref{img:dnn}.
This process is illustrated in Figure \ref{img:dnn} where one \ac{dnn} layer is processed.
\begin{figure}
\centering
\begin{tikzpicture}
\node[circle,thick,draw=red!60,fill=blue!20,minimum size=5mm,anchor=center] (inode0) at (0,0) {$i_0$};
\node[circle,thick,draw=red!60,fill=blue!30,minimum size=5mm] (inode1) [below of=inode0] {$i_1$};
\node[circle,thick,draw=red!60,fill=blue!40,minimum size=5mm] (inode2) [below of=inode1] {$i_2$};
\node[circle,thick,draw=red!60,fill=blue!50,minimum size=5mm] (inode3) [below of=inode2] {$i_3$};
\node[circle,draw=black,fill=ForestGreen!20,minimum size=5mm,anchor=center] (onode0) at (2cm,0.5cm) {$o_0$};
\node[circle,thick,draw=red!60,fill=ForestGreen!30,minimum size=5mm] (onode1) [below of=onode0] {$o_1$};
\node[circle,draw=black,fill=ForestGreen!40,minimum size=5mm] (onode2) [below of=onode1] {$o_2$};
\node[circle,draw=black,fill=ForestGreen!50,minimum size=5mm] (onode3) [below of=onode2] {$o_3$};
\node[circle,draw=black,fill=ForestGreen!60,minimum size=5mm] (onode4) [below of=onode3] {$o_4$};
\draw (inode0.east) to (onode0.west);
\draw (inode1.east) to (onode0.west);
\draw (inode2.east) to (onode0.west);
\draw (inode3.east) to (onode0.west);
\draw (inode0.east) to (onode2.west);
\draw (inode1.east) to (onode2.west);
\draw (inode2.east) to (onode2.west);
\draw (inode3.east) to (onode2.west);
\draw (inode0.east) to (onode3.west);
\draw (inode1.east) to (onode3.west);
\draw (inode2.east) to (onode3.west);
\draw (inode3.east) to (onode3.west);
\draw (inode0.east) to (onode4.west);
\draw (inode1.east) to (onode4.west);
\draw (inode2.east) to (onode4.west);
\draw (inode3.east) to (onode4.west);
\draw[red!60,thick] (inode0.east) to (onode1.west);
\draw[red!60,thick] (inode1.east) to (onode1.west);
\draw[red!60,thick] (inode2.east) to (onode1.west);
\draw[red!60,thick] (inode3.east) to (onode1.west);
\matrix (matrix) [matrix of nodes,left delimiter=(,right delimiter=),right of=onode2,node distance=4cm] {
$w_{0,0}$ & $w_{0,1}$ & $w_{0,2}$ & $w_{0,3}$ \\
$w_{1,0}$ & $w_{1,1}$ & $w_{1,2}$ & $w_{1,3}$ \\
$w_{2,0}$ & $w_{2,1}$ & $w_{2,2}$ & $w_{2,3}$ \\
$w_{3,0}$ & $w_{3,1}$ & $w_{3,2}$ & $w_{3,3}$ \\
$w_{4,0}$ & $w_{4,1}$ & $w_{4,2}$ & $w_{4,3}$ \\
};
\node[draw,thick,red!60,rounded corners,inner sep=0,fit=(matrix-2-1) (matrix-2-4)] {};
\node (prod) [right of=matrix,node distance=2.6cm] {$*$};
\matrix (vector) [matrix of nodes,left delimiter=(,right delimiter=),right of=prod] {
$i_{0}$ \\
$i_{1}$ \\
$i_{2}$ \\
$i_{3}$ \\
};
\node (eq) [right of=vector,node distance=1.2cm] {$=$};
\end{tikzpicture}
\caption[]{\cite{he2020}}
\input{images/dnn}
\caption[A fully connected \ac{dnn} layer]{A fully connected \ac{dnn} layer \cite{he2020}.}
\label{img:dnn}
\end{figure}
Such an operation, defined in the widely used \ac{blas} library \cite{blas1979}, is also known as a \acs{gemv} routine.
% hier matrixoperationen für dnns beschreiben
% memory-boundness
% BLAS kernel und so weiter...
Because one matrix element is only used exactly once in the calculation the output vector, there is no data reuse of the matrix.
Further, as the weight matrices tend to be too large to fit on the on-chip cache, such a \ac{gemv} operation is deeply memory-bound \cite{he2020}.
As a result, such an opertion is a good fit for \ac{pim}.
\subsection{PIM Architectures}
\label{sec:pim_architectures}
Many different \ac{pim} architectures have been proposed in the past by research and recently also real implementions by hardware vendors have been presented.
These proposals differ largely in their positioning of the applied processing operation, reaching from analogue distribution of capacitor charges on the \ac{subarray}-level, to additional processing units on the global \ac{io} level.
In essence, these placements of the approaches can be summarised as follows \cite{sudarshan2022}:
\begin{enumerate}
\item Inside the memory \ac{subarray}.
\item At the \ac{psa} region near a \ac{subarray}.
\item Outside of the bank in its peripheral region.
\item At the \ac{io} region of the memory.
\end{enumerate}
Each of these approaches come with different advantages and disadvantages.
In short, the nearer the processing happens to the memory \acs{subarray}, the higher is the achievable processing bandwidth.
But also, the integration of the \ac{pim} units becomes more difficult as area and power constraints restrict the integration.
% kurzer overview über die kategorien von PIM (paper vom lehrstuhl)
In the following, three \ac{pim} approaches are highlighted in more detail.
\subsection{UPMEM}
\label{sec:pim_upmem}

68
src/images/dnn.tex Normal file
View File

@@ -0,0 +1,68 @@
\begin{tikzpicture}
\node[circle,thick,draw=red!60,fill=blue!20,minimum size=5mm,anchor=center] (inode0) at (0,0) {$i_0$};
\node[circle,thick,draw=red!60,fill=blue!30,minimum size=5mm] (inode1) [below of=inode0] {$i_1$};
\node[circle,thick,draw=red!60,fill=blue!40,minimum size=5mm] (inode2) [below of=inode1] {$i_2$};
\node[circle,thick,draw=red!60,fill=blue!50,minimum size=5mm] (inode3) [below of=inode2] {$i_3$};
\node[circle,draw=black,fill=ForestGreen!20,minimum size=5mm,anchor=center] (onode0) at (2cm,0.5cm) {$o_0$};
\node[circle,thick,draw=red!60,fill=ForestGreen!30,minimum size=5mm] (onode1) [below of=onode0] {$o_1$};
\node[circle,draw=black,fill=ForestGreen!40,minimum size=5mm] (onode2) [below of=onode1] {$o_2$};
\node[circle,draw=black,fill=ForestGreen!50,minimum size=5mm] (onode3) [below of=onode2] {$o_3$};
\node[circle,draw=black,fill=ForestGreen!60,minimum size=5mm] (onode4) [below of=onode3] {$o_4$};
\draw (inode0.east) to (onode0.west);
\draw (inode1.east) to (onode0.west);
\draw (inode2.east) to (onode0.west);
\draw (inode3.east) to (onode0.west);
\draw (inode0.east) to (onode2.west);
\draw (inode1.east) to (onode2.west);
\draw (inode2.east) to (onode2.west);
\draw (inode3.east) to (onode2.west);
\draw (inode0.east) to (onode3.west);
\draw (inode1.east) to (onode3.west);
\draw (inode2.east) to (onode3.west);
\draw (inode3.east) to (onode3.west);
\draw (inode0.east) to (onode4.west);
\draw (inode1.east) to (onode4.west);
\draw (inode2.east) to (onode4.west);
\draw (inode3.east) to (onode4.west);
\draw[red!60,thick] (inode0.east) to (onode1.west);
\draw[red!60,thick] (inode1.east) to (onode1.west);
\draw[red!60,thick] (inode2.east) to (onode1.west);
\draw[red!60,thick] (inode3.east) to (onode1.west);
\matrix (matrix) [matrix of nodes,left delimiter=(,right delimiter=),right of=onode2,node distance=3.5cm] {
$w_{0,0}$ & $w_{0,1}$ & $w_{0,2}$ & $w_{0,3}$ \\
$w_{1,0}$ & $w_{1,1}$ & $w_{1,2}$ & $w_{1,3}$ \\
$w_{2,0}$ & $w_{2,1}$ & $w_{2,2}$ & $w_{2,3}$ \\
$w_{3,0}$ & $w_{3,1}$ & $w_{3,2}$ & $w_{3,3}$ \\
$w_{4,0}$ & $w_{4,1}$ & $w_{4,2}$ & $w_{4,3}$ \\
};
\node (prod) [right of=matrix,node distance=2.6cm] {$*$};
\matrix (input_vector) [matrix of nodes,left delimiter=(,right delimiter=),right of=prod] {
$i_{0}$ \\
$i_{1}$ \\
$i_{2}$ \\
$i_{3}$ \\
};
\node (eq) [right of=input_vector,node distance=1.1cm] {$=$};
\matrix (output_vector) [matrix of nodes,left delimiter=(,right delimiter=),right of=eq,node distance=1.1cm] {
$o_{0}$ \\
$o_{1}$ \\
$o_{2}$ \\
$o_{3}$ \\
$o_{4}$ \\
};
\node[draw,thick,red!60,rounded corners,inner sep=0,fit=(matrix-2-1) (matrix-2-4)] {};
\node[draw,thick,red!60,rounded corners,inner sep=0,fit=(input_vector-1-1) (input_vector-4-1)] {};
\node[draw,thick,red!60,rounded corners,inner sep=0,fit=(output_vector-2-1) (output_vector-2-1)] {};
\end{tikzpicture}

View File

@@ -24,6 +24,7 @@
% Configurations
\usetikzlibrary{matrix}
\usetikzlibrary{automata}
\usetikzlibrary{fit}
\setlength\textheight{24cm}
\setkomafont{paragraph}{\footnotesize}