More PIM overview
This commit is contained in:
@@ -163,6 +163,14 @@
|
||||
short = GEMM,
|
||||
long = matrix matrix multiply,
|
||||
}
|
||||
\DeclareAcronym{mac}{
|
||||
short = MAC,
|
||||
long = multiply-accumulate,
|
||||
}
|
||||
\DeclareAcronym{mad}{
|
||||
short = MAD,
|
||||
long = multiply-add,
|
||||
}
|
||||
\DeclareAcronym{tlm}{
|
||||
short = TLM,
|
||||
long = transaction-level modeling,
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
\subsection{Applicable Workloads}
|
||||
\label{sec:pim_workloads}
|
||||
|
||||
As already discussed in Section \ref{sec:introduction}, \ac{pim} is a good fit for accelerating memory-bound workloads.
|
||||
As already discussed in Section \ref{sec:introduction}, \ac{pim} is a good fit for accelerating memory-bound workloads with low operational intensity.
|
||||
In contrast, compute-bound workloads tend to have high data reuse and can make excessive use of the on-chip cache and therefore do not need to utilize the full memory bandwidth.
|
||||
For problems like this, \ac{pim} is only of limited use.
|
||||
|
||||
@@ -44,11 +44,28 @@ In essence, these placements of the approaches can be summarised as follows \cit
|
||||
\end{enumerate}
|
||||
|
||||
Each of these approaches come with different advantages and disadvantages.
|
||||
In short, the nearer the processing happens to the memory \acs{subarray}, the higher is the achievable processing bandwidth.
|
||||
But also, the integration of the \ac{pim} units becomes more difficult as area and power constraints restrict the integration.
|
||||
In short, the nearer the processing happens to the memory \acs{subarray}, the higher is the energy efficiency achievable processing bandwidth.
|
||||
On the other hand, the integration of the \ac{pim} units becomes more difficult as area and power constraints restrict the integration \cite{sudarshan2022}.
|
||||
% kurzer overview über die kategorien von PIM (paper vom lehrstuhl)
|
||||
|
||||
In the following, three \ac{pim} approaches are highlighted in more detail.
|
||||
Processing inside the \ac{subarray} has the highest achievable level of parallelism with the number of operand bits equal to the size of the row.
|
||||
It also requires the least amount of energy to load the data from the \acs{subarray} into the \acp{psa} to perform operations on it.
|
||||
The downside of this approach is the need to alter the highly for density optimized \ac{subarray} architecture.
|
||||
One example for such an approach is Ambit \cite{seshadri2020}.
|
||||
Ambit provides a mechanism to activate multiple rows within a \ac{subarray} at once and perform bulk bitwise operations such as AND, OR and NOT on the row data.
|
||||
|
||||
Far fewer but still challenging constraints are posed onto the integration of compute units in the region of the \acp{psa}.
|
||||
\cite{sudarshan2022a} introduces a two-stage design that integrates current mirror based analog units near the \ac{subarray} that make \ac{mac} operations used in \ac{dnn} applications possible.
|
||||
|
||||
The integration of compute units at the \ac{io} region of the bank makes area-intensive operations such as ADD, \ac{mac} or \ac{mad} possible.
|
||||
This leaves the highly optimized \ac{subarray} and \ac{psa} region as-is and only reduces the memory density by reducing the density per die to make space for the additional compute units.
|
||||
However, the achievable level of parallelism is lower than in the other approaches and is defined by the prefetch architecture i.e., the maximum burst size of the memory banks.
|
||||
|
||||
Placing the compute units at the \ac{io} region of the \ac{dram} has the fewest physical restrictions and makes complex accelerators possible.
|
||||
On the downside, the bank parallelism cannot be exploited to perform multiple computations concurrently on a bank-wise level.
|
||||
Also, the energy required to move data to the \ac{io} boundary of the \ac{dram} is far higher than in the other approaches.
|
||||
|
||||
In the following, three \ac{pim} approaches that place the compute units at the bank \ac{io} boundary are highlighted in more detail.
|
||||
|
||||
\subsection{UPMEM}
|
||||
\label{sec:pim_upmem}
|
||||
|
||||
Reference in New Issue
Block a user