First part of DRAM basics

2024-01-28 19:39:54 +01:00
parent 66d2aaacaf
commit 3273937a5d
7 changed files with 86 additions and 9 deletions
--- a/src/acronyms.tex
+++ b/src/acronyms.tex
@@ -34,6 +34,10 @@
    short = DRAM,
    long = dynamic random-access memory,
 }
+\DeclareAcronym{ram}{
+    short = RAM,
+    long = random-access memory,
+}
 \DeclareAcronym{hbm}{
    short = HBM,
    long = High Bandwidth Memory,
@@ -42,6 +46,38 @@
    short = PIM,
    long = processing-in-memory,
 }
+\DeclareAcronym{subarray}{
+    short = SA,
+    long = subarray,
+}
+\DeclareAcronym{lwl}{
+    short = LWL,
+    long = local wordline,
+}
+\DeclareAcronym{lbl}{
+    short = LWL,
+    long = local bitline,
+}
+\DeclareAcronym{mwl}{
+    short = MWL,
+    long = master wordline,
+}
+\DeclareAcronym{mbl}{
+    short = MWL,
+    long = master bitline,
+}
+\DeclareAcronym{psa}{
+    short = PSA,
+    long = primary sense amplifier,
+}
+\DeclareAcronym{ssa}{
+    short = SSA,
+    long = secondary sense amplifier,
+}
+\DeclareAcronym{csl}{
+    short = CSL,
+    long = column select line,
+}
 \DeclareAcronym{tlm}{
    short = TLM,
    long = transaction level modeling,
--- a/src/chapters/dram.tex
+++ b/src/chapters/dram.tex
@@ -1,2 +1,43 @@
 \section{DRAM Architecture}
-\label{sec:dram}
+\label{sec:dram}
+
+This section introduces the basics of modern DRAM architecture and provides the background necessary to understand the theory behind various \ac{pim} integrations.
+In particular, the architecture of \ac{hbm} will be discussed, since it is the \ac{dram} technology on which the \ac{pim} architecture implemented in this thesis is based.
+
+\subsection{DRAM Basics}
+\label{sec:dram_basics}
+
+A \ac{dram} is a special type of \ac{ram} that uses a single transistor-capacitor pair as a memory cell to encode exactly one bit \cite{jacob2008}.
+Since a capacitor holds electrical charge, it is a volatile type of storage and the bit value it represents eventually vanishes over time as the stored charge is leaked.
+To circumvent this, regular \textit{refresh} operations are required, involving reading and rewriting the stored value, making this storage method \textit{dynamic}.
+A typical \ac{dram} device consists of several banks, which are themselves composed of a set of \textit{memory arrays}, which in turn are composed of multiple \acp{subarray}.
+Banks operate independently of each other, while the memory arrays of each bank operate in lockstep mode to form the per-device data word, with the number of data bits equal to the number of memory arrays per bank.
+The \acp{subarray} are grid-like structures composed of \acp{lwl} and \acp{lbl}, with a storage cell at each intersection point.
+The \ac{lwl} is connected to the transistor's gate, switching it on and off, while the \ac{lbl} is used to access the stored value.
+Global \acp{mwl} and \acp{mbl} span over all \acp{subarray}, forming complete \textit{rows} and \textit{columns} of a memory array. 
+
+Because the charge stored in each cell is very small, so-called \acp{psa} are needed to amplify the stored voltage of each cell while it is being connected to the shared \ac{lbl} \cite{jacob2008}, illustrated in figure \ref{img:psa}.
+
+\begin{figure}[!ht]
+	\centering
+	\includegraphics{images/psa}
+	\caption[\ac{psa} of an open bitline architecture]{\ac{psa} of an open bitline architecture \cite{jacob2008} \cite{jung2017a}}
+	\label{img:psa}
+\end{figure}
+
+However, before a value can be read, the \ac{psa} needs to \textit{precharge} its bitline to a halfway voltage $\frac{V_{DD}}{2}$ between 0 and $V_{DD}$.
+When the capacitor is then connected to the bitline, it pushes the voltage level marginally in one direction, enough for the \ac{psa} to detect the voltage difference to an adjacent bitline in another \ac{subarray} and amplifies the voltage level all the way to high or low.
+
+The process of loading the stored value into the \ac{psa} is done for all columns at the same time and is called \textit{row activation}.
+Once a row is activated, it is referred to as \textit{open} and following from a  
+% \ac{csl}
+
+\begin{figure}[!ht]
+	\centering
+	\includegraphics{images/bank}
+	\caption[]{\cite{jung2017a}}
+	\label{img:bank}
+\end{figure}
+
+\subsection{High Bandwidth Memory}
+\label{sec:hbm}
--- a/src/chapters/introduction.tex
+++ b/src/chapters/introduction.tex
@@ -24,12 +24,12 @@ In addition, Moore's Law is slowing down as further device scaling approaches ph
 The exponential grow in compute energy will eventually be constrained by market dynamics, flattening the energy curve and making it impossible to meet future computing demands.
 It is therefore required to achieve radical improvements in energy efficiency in order to avoid such a scenario.

-In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} have become very popular, as they provide orders of magnitude higher performance and energy efficiency for \ac{ai} applications \cite{kwon2021}.
-However, research must also take into account off-chip memory - moving data between the computation unit and the \ac{dram} is very costly, as fetching operands uses consumes more power than performing the computation on them itself.
+In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} have become very popular, as they provide orders of magnitude higher performance and energy efficiency for \ac{ai} applications  than general-purpose processors \cite{kwon2021}.
+However, research must also take into account off-chip memory - moving data between the computation unit and the \ac{dram} is very costly, as fetching operands consumes more power than performing the computation on them itself.
 While performing a double precision floating point operation on a $\qty{28}{\nano\meter}$ technology might consume an energy of about $\qty{20}{\pico\joule}$, fetching the operands from \ac{dram} consumes almost 3 orders of magnitude more energy at about $\qty{16}{\nano\joule}$ \cite{dally2010}.

 Furthermore, many types of \ac{dnn} used for language and speech processing, such as \acp{rnn}, \acp{mlp} and some layers of \acp{cnn}, are severely limited by the memory bandwidth that the \ac{dram} can provide, making them \textit{memory-bounded} \cite{he2020}.
-In contrast, compute-intensive workloads, such as visual processing, are referred to as \textit{compute-bound}.
+In contrast, compute-intensive workloads, such as visual processing, are referred to as \textit{compute-bounded}.

 \begin{figure}[!ht]
 	\centering
@@ -43,10 +43,10 @@ However, recent \ac{ai} technologies require even greater bandwidth than \ac{hbm

 All things considered, to meet the need for more energy-efficient computing systems, which are increasingly becoming memory-bounded, new approaches to computing are required.
 This has led researchers to reconsider past \ac{pim} architectures and advance them further \cite{lee2021}.
-\Ac{pim} integrates computational logic into the \ac{dram} itself, to exploit minimal data movement cost and extensive internal data parallelism \cite{sudarshan2022}.
+\Ac{pim} integrates computational logic into the \ac{dram} itself, to exploit minimal data movement cost and extensive internal data parallelism \cite{sudarshan2022}, making it a good fit for memory-bounded problems.

 This work analyzes various \ac{pim} architectures, identifies the challenges of integrating them into state-of-the-art \acp{dram}, examines the changes required in the way applications lay out their data in memory and explores a \ac{pim} implementation from one of the leading \ac{dram} vendors.
-The remainder is structured as follows:
+The remainder of this work is structured as follows:
 Section \ref{sec:dram} gives a brief overview of the architecture of \acp{dram}, in detail that of \ac{hbm}.
 In section \ref{sec:pim} various types of \ac{pim} architectures are presented, with some concrete examples discussed in detail.
 Section \ref{sec:vp} is an introduction to virtual prototyping and system-level hardware simulation.
--- a/src/doc.bib
+++ b/src/doc.bib
@@ -47,7 +47,7 @@
  archiveprefix = {arxiv},
  langid = {english},
  keywords = {read},
-  file = {/home/derek/Nextcloud/Verschiedenes/Zotero/storage/UFED59VX/Chen et al. - 2023 - SimplePIM A Software Framework for Productive and.pdf;/home/derek/Nextcloud/Verschiedenes/Zotero/storage/X2X78NCZ/diss_matthias_jung.pdf}
+  file = {/home/derek/Nextcloud/Verschiedenes/Zotero/storage/UFED59VX/Chen et al. - 2023 - SimplePIM A Software Framework for Productive and.pdf}
 }

@misc{dally2010,
--- a/src/images/bank.pdf
+++ b/src/images/bank.pdf
--- a/src/images/psa.pdf
+++ b/src/images/psa.pdf
--- a/src/plots/roofline.tex
+++ b/src/plots/roofline.tex
@@ -28,8 +28,8 @@
 			40 70
 			100 70
 		}
-		node[above,sloped,pos=0.25,scale=0.8] {\textit{memory-bound}}
-		node[above,pos=0.75,scale=0.8] {\textit{compute-bound}};
+		node[above,sloped,pos=0.25,scale=0.8] {\textit{memory bound}}
+		node[above,pos=0.75,scale=0.8] {\textit{compute bound}};

 		\addplot [very thick, dashed, BrickRed]
 		table {