This commit is contained in:
2024-02-06 22:25:41 +01:00
parent 0c33c99c61
commit 5f52c3ae9f
2 changed files with 25 additions and 0 deletions

View File

@@ -171,6 +171,18 @@
short = MAD,
long = multiply-add,
}
\DeclareAcronym{dpu}{
short = DPU,
long = DRAM Processing Units,
}
\DeclareAcronym{risc}{
short = RISC,
long = reduced instruction set computer,
}
\DeclareAcronym{sdk}{
short = SDK,
long = software development kit,
}
\DeclareAcronym{tlm}{
short = TLM,
long = transaction-level modeling,

View File

@@ -75,6 +75,19 @@ In the following, three \ac{pim} approaches that place the compute units at the
\subsection{UPMEM}
\label{sec:pim_upmem}
The first publicly available real-world \ac{pim} architecture has been designed and built by the company UPMEM \cite{gomez-luna2022}.
UPMEM combines regular DDR4 \ac{dimm} based \ac{dram} with a set of \ac{pim}-enabled UPMEM \acp{dimm} consisting of several \ac{pim} chips.
In each \ac{pim} chip, there are of 8 \acp{dpu}, each of which has exclusive access to a $\qty{64}{\mega\byte}$ memory bank, a $\qty{24}{\kilo\byte}$ instruction memory and a $\qty{64}{\kilo\byte}$ scratchpad memory.
The host processor can access the memory banks to copy input data from main memory and retrieve results.
While copying, the data layout must be changed to store the data words continuously in a \ac{pim} bank, in contrast to the horizontal \ac{dram} mapping used in \ac{dimm} modules, where a data word is split across multiple devices.
UPMEM provides a \ac{sdk} that orchestrates the data movement from the main memory to the \ac{pim} banks and modifies the data layout.
Each \ac{dpu} is a multithreaded $\qty{32}{bit}$ \ac{risc} core with a full set of general purpose registers and a 14-stage pipeline.
The \acp{dpu} execute compiled C code using a specialized compiler toolchain that provides limited support of the standard library.
With a system clock of $\qty{400}{\mega\hertz}$, the internal bandwidth of a \ac{dpu} amounts to $\qty[per-mode = symbol]{800}{\mega\byte\per\second}$.
A system can integrate 128 \acp{dpu} per \ac{dimm}, with a total of 20 UPMEM \acp{dimm}.
This gives a maximum \ac{pim} bandwidth of $\qty[per-mode = symbol]{2}{\tera\byte\per\second}$ \cite{gomez-luna2022}.
\subsection{Newton AiM}
\label{sec:pim_newton}