UPMEM
This commit is contained in:
@@ -171,6 +171,18 @@
|
||||
short = MAD,
|
||||
long = multiply-add,
|
||||
}
|
||||
\DeclareAcronym{dpu}{
|
||||
short = DPU,
|
||||
long = DRAM Processing Units,
|
||||
}
|
||||
\DeclareAcronym{risc}{
|
||||
short = RISC,
|
||||
long = reduced instruction set computer,
|
||||
}
|
||||
\DeclareAcronym{sdk}{
|
||||
short = SDK,
|
||||
long = software development kit,
|
||||
}
|
||||
\DeclareAcronym{tlm}{
|
||||
short = TLM,
|
||||
long = transaction-level modeling,
|
||||
|
||||
@@ -75,6 +75,19 @@ In the following, three \ac{pim} approaches that place the compute units at the
|
||||
\subsection{UPMEM}
|
||||
\label{sec:pim_upmem}
|
||||
|
||||
The first publicly available real-world \ac{pim} architecture has been designed and built by the company UPMEM \cite{gomez-luna2022}.
|
||||
UPMEM combines regular DDR4 \ac{dimm} based \ac{dram} with a set of \ac{pim}-enabled UPMEM \acp{dimm} consisting of several \ac{pim} chips.
|
||||
In each \ac{pim} chip, there are of 8 \acp{dpu}, each of which has exclusive access to a $\qty{64}{\mega\byte}$ memory bank, a $\qty{24}{\kilo\byte}$ instruction memory and a $\qty{64}{\kilo\byte}$ scratchpad memory.
|
||||
The host processor can access the memory banks to copy input data from main memory and retrieve results.
|
||||
While copying, the data layout must be changed to store the data words continuously in a \ac{pim} bank, in contrast to the horizontal \ac{dram} mapping used in \ac{dimm} modules, where a data word is split across multiple devices.
|
||||
UPMEM provides a \ac{sdk} that orchestrates the data movement from the main memory to the \ac{pim} banks and modifies the data layout.
|
||||
|
||||
Each \ac{dpu} is a multithreaded $\qty{32}{bit}$ \ac{risc} core with a full set of general purpose registers and a 14-stage pipeline.
|
||||
The \acp{dpu} execute compiled C code using a specialized compiler toolchain that provides limited support of the standard library.
|
||||
With a system clock of $\qty{400}{\mega\hertz}$, the internal bandwidth of a \ac{dpu} amounts to $\qty[per-mode = symbol]{800}{\mega\byte\per\second}$.
|
||||
A system can integrate 128 \acp{dpu} per \ac{dimm}, with a total of 20 UPMEM \acp{dimm}.
|
||||
This gives a maximum \ac{pim} bandwidth of $\qty[per-mode = symbol]{2}{\tera\byte\per\second}$ \cite{gomez-luna2022}.
|
||||
|
||||
\subsection{Newton AiM}
|
||||
\label{sec:pim_newton}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user