Complete DRAM and HBM2 chapter
This commit is contained in:
@@ -40,8 +40,13 @@
|
||||
}
|
||||
\DeclareAcronym{hbm}{
|
||||
short = HBM,
|
||||
alt = HBM2,
|
||||
long = High Bandwidth Memory,
|
||||
}
|
||||
\DeclareAcronym{sip}{
|
||||
short = SiP,
|
||||
long = System in Package,
|
||||
}
|
||||
\DeclareAcronym{pim}{
|
||||
short = PIM,
|
||||
long = processing-in-memory,
|
||||
@@ -130,7 +135,23 @@
|
||||
short = AM,
|
||||
long = address mapping,
|
||||
}
|
||||
\DeclareAcronym{jedec}{
|
||||
short = JEDEC,
|
||||
long = Joint Electron Device Engineering Council,
|
||||
}
|
||||
\DeclareAcronym{ddr4}{
|
||||
short = DDR4,
|
||||
long = Double Data Rate 4,
|
||||
}
|
||||
\DeclareAcronym{tsv}{
|
||||
short = TSV,
|
||||
long = trough-silicon via,
|
||||
}
|
||||
\DeclareAcronym{pch}{
|
||||
short = pCH,
|
||||
long = pseudo channel,
|
||||
}
|
||||
\DeclareAcronym{tlm}{
|
||||
short = TLM,
|
||||
long = transaction level modeling,
|
||||
long = transaction-level modeling,
|
||||
}
|
||||
|
||||
@@ -92,10 +92,43 @@ Because banks can be controlled independently, one bank can be outputting the ne
|
||||
In addition to \ac{dimm}-based \ac{dram}, which is mainly used in desktop workstations, there are alternative \ac{dram} subsystems.
|
||||
One of these is device-based \ac{dram}, where the memory devices are directly soldered onto the same \ac{pcb} as the \ac{mpsoc}.
|
||||
Another type is 2.5D-integrated \ac{dram}, where multiple memory dies are stacked on top of each other and connected to the \ac{mpsoc} by a silicon interposer \cite{jung2017a}.
|
||||
Such a 2.5D-integrated type, used in \acp{gpu} and \acp{tpu}, is \ac{hbm}, which will be introduced in greater detail in the following section.
|
||||
Such a 2.5D-integrated type used in \acp{gpu} and \acp{tpu} is \ac{hbm}, which will be introduced in greater detail in the following section.
|
||||
|
||||
\subsection{\Acf{hbm}}
|
||||
\label{sec:hbm}
|
||||
|
||||
% similar to ranks, pch ...
|
||||
\Aca{hbm} is a \ac{dram} standard that has been defined by \ac{jedec} in 2016 as a successor of the previous \ac{hbm} standard \cite{jedec2015a}.
|
||||
What differentiates \ac{hbm} from other types of memory is its \ac{sip} approach.
|
||||
Several \ac{dram} dies are stacked on top of each other and connected with \acp{tsv} to form a cube of memory dies consisting of many layers and a buffer die at the bottom, as shown in Figure \ref{img:sip}.
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.7\linewidth]{images/sip}
|
||||
\caption[Cross-section view of an \ac{hbm} \ac{sip}]{Cross-section view of a \ac{hbm} \ac{sip} \cite{lee2021}.}
|
||||
\label{img:sip}
|
||||
\end{figure}
|
||||
Such a cube is then placed onto a common silicon interposer that connects it to its host processor.
|
||||
This packaging brings the memory closer to the \ac{mpsoc}, which reduces the latency, minimizes the bus capacitance and, most importantly, allows for a very wide memory interface.
|
||||
For example, compared to a conventional \acs{ddr4} \ac{dram}, this tight integration enables $\qtyrange[range-units=single]{10}{13}{\times}$ more \ac{io} connections to the \ac{mpsoc} and $\qtyrange[range-units=single]{2}{2.4}{\times}$ lower energy per bit-transfer \cite{lee2021}.
|
||||
|
||||
One memory stack supports up to 8 independent memory channels, each of which containing up to 16 banks, which are divided into 4 bank groups.
|
||||
The command, address and data bus operate at \ac{ddr}, i.e., they transfer two words per interface clock cycle $t_{CK}$.
|
||||
With a $t_{CK}$ of $\qty{1}{\giga\hertz}$ \aca{hbm} achieves a pin transfer rate of $\qty{2}{\giga T \per\second}$, resulting in $\qty[per-mode = symbol]{256}{\giga\byte\per\second}$ for the 1024-bit wide data bus of each stack.
|
||||
A single data transfer is performed with either a \ac{bl} of 2 or 4, depending on the \ac{pch} configuration.
|
||||
In \ac{pch} mode, the data bus is split in half (i.e., 64-bit) to enable independent data transmission, further increasing parallelism while sharing a common command and address bus between the two \acp{pch}.
|
||||
Thus, accessing \aca{hbm} in \ac{pch} mode transmits a $\qty{256}{\bit}=\qty{32}{\byte}$ burst with a \ac{bl} of 4 over the $\qty{64}{\bit}$ wide data bus.
|
||||
|
||||
Figure \ref{img:hbm} illustrates the internal architecture of a single memory die.
|
||||
It consists of 2 independent channels, each with 2 \acp{pch} of 4 bank groups with 4 banks each, resulting in 16 banks per \ac{pch}.
|
||||
In the center of the die, the \acp{tsv} connect to the next die above or the previous die below.
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.7\linewidth]{images/hbm}
|
||||
\caption[\aca{hbm} memory die architecture]{\aca{hbm} memory die architecture \cite{lee2021}}
|
||||
\label{img:hbm}
|
||||
\end{figure}
|
||||
|
||||
% still, bandwidth requirements of new AI applications are not met by HBM2:waq
|
||||
Although \aca{hbm} provides a high amount of bandwidth, many modern \acp{dnn} applications reside in the memory-bounded limitations.
|
||||
While one approach would be to further increase the bandwidth by integrating more stacks on the silicon interposer, other constraints such as thermal limits or the limited number of \ac{io} connections on the interposer may make this impractical \cite{lee2021}.
|
||||
Another approach could be \acf{pim}: Using \ac{hbm}'s 2.5D architecture, it is possible to incorporate additional compute units directly into the memory stacks, increasing the achievable parallel bandwidth and reducing the burden of transferring all the data to the host processor to perform operations on it.
|
||||
|
||||
@@ -23,6 +23,8 @@ kurzer overview über die kategorien von PIM (paper vom lehrstuhl)
|
||||
\subsection{Newton AiM}
|
||||
\label{sec:pim_newton}
|
||||
|
||||
gddr (device-based)
|
||||
|
||||
\subsection{FIMDRAM/HBM-PIM}
|
||||
\label{sec:pim_fim}
|
||||
unterschiede zu hynix pim
|
||||
|
||||
@@ -142,6 +142,14 @@
|
||||
file = {/home/derek/Nextcloud/Verschiedenes/Zotero/storage/BNREUV34/Jacob et al. - 2008 - Memory systems Cache, DRAM, Disk.pdf}
|
||||
}
|
||||
|
||||
@misc{jedec2015a,
|
||||
title = {{{HIGH BANDWIDTH MEMORY}} ({{HBM}}) {{DRAM}}},
|
||||
author = {{JEDEC}},
|
||||
year = {2015},
|
||||
month = nov,
|
||||
file = {/home/derek/Nextcloud/Verschiedenes/Zotero/storage/TZ9AHMH8/JESD235A_HBM.pdf}
|
||||
}
|
||||
|
||||
@misc{jedec2021b,
|
||||
title = {{{DDR5 SDRAM}}},
|
||||
author = {{JEDEC}},
|
||||
|
||||
BIN
src/images/hbm.pdf
Normal file
BIN
src/images/hbm.pdf
Normal file
Binary file not shown.
BIN
src/images/hbm_old.pdf
Normal file
BIN
src/images/hbm_old.pdf
Normal file
Binary file not shown.
BIN
src/images/sip.pdf
Normal file
BIN
src/images/sip.pdf
Normal file
Binary file not shown.
@@ -112,7 +112,7 @@
|
||||
\begingroup
|
||||
\phantomsection
|
||||
\addcontentsline{toc}{section}{List of Abbreviations}
|
||||
\printacronyms[name=List of Abbreviations]
|
||||
\printacronyms[name=List of Abbreviations,pages={display=all}]
|
||||
% \setlength{\nomitemsep}{8pt}
|
||||
\endgroup
|
||||
\newpage
|
||||
|
||||
Reference in New Issue
Block a user