First part of DRAM basics

This commit is contained in:
2024-01-28 19:39:54 +01:00
parent 66d2aaacaf
commit 3273937a5d
7 changed files with 86 additions and 9 deletions

View File

@@ -24,12 +24,12 @@ In addition, Moore's Law is slowing down as further device scaling approaches ph
The exponential grow in compute energy will eventually be constrained by market dynamics, flattening the energy curve and making it impossible to meet future computing demands.
It is therefore required to achieve radical improvements in energy efficiency in order to avoid such a scenario.
In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} have become very popular, as they provide orders of magnitude higher performance and energy efficiency for \ac{ai} applications \cite{kwon2021}.
However, research must also take into account off-chip memory - moving data between the computation unit and the \ac{dram} is very costly, as fetching operands uses consumes more power than performing the computation on them itself.
In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} have become very popular, as they provide orders of magnitude higher performance and energy efficiency for \ac{ai} applications than general-purpose processors \cite{kwon2021}.
However, research must also take into account off-chip memory - moving data between the computation unit and the \ac{dram} is very costly, as fetching operands consumes more power than performing the computation on them itself.
While performing a double precision floating point operation on a $\qty{28}{\nano\meter}$ technology might consume an energy of about $\qty{20}{\pico\joule}$, fetching the operands from \ac{dram} consumes almost 3 orders of magnitude more energy at about $\qty{16}{\nano\joule}$ \cite{dally2010}.
Furthermore, many types of \ac{dnn} used for language and speech processing, such as \acp{rnn}, \acp{mlp} and some layers of \acp{cnn}, are severely limited by the memory bandwidth that the \ac{dram} can provide, making them \textit{memory-bounded} \cite{he2020}.
In contrast, compute-intensive workloads, such as visual processing, are referred to as \textit{compute-bound}.
In contrast, compute-intensive workloads, such as visual processing, are referred to as \textit{compute-bounded}.
\begin{figure}[!ht]
\centering
@@ -43,10 +43,10 @@ However, recent \ac{ai} technologies require even greater bandwidth than \ac{hbm
All things considered, to meet the need for more energy-efficient computing systems, which are increasingly becoming memory-bounded, new approaches to computing are required.
This has led researchers to reconsider past \ac{pim} architectures and advance them further \cite{lee2021}.
\Ac{pim} integrates computational logic into the \ac{dram} itself, to exploit minimal data movement cost and extensive internal data parallelism \cite{sudarshan2022}.
\Ac{pim} integrates computational logic into the \ac{dram} itself, to exploit minimal data movement cost and extensive internal data parallelism \cite{sudarshan2022}, making it a good fit for memory-bounded problems.
This work analyzes various \ac{pim} architectures, identifies the challenges of integrating them into state-of-the-art \acp{dram}, examines the changes required in the way applications lay out their data in memory and explores a \ac{pim} implementation from one of the leading \ac{dram} vendors.
The remainder is structured as follows:
The remainder of this work is structured as follows:
Section \ref{sec:dram} gives a brief overview of the architecture of \acp{dram}, in detail that of \ac{hbm}.
In section \ref{sec:pim} various types of \ac{pim} architectures are presented, with some concrete examples discussed in detail.
Section \ref{sec:vp} is an introduction to virtual prototyping and system-level hardware simulation.