First part of DRAM basics
This commit is contained in:
@@ -24,12 +24,12 @@ In addition, Moore's Law is slowing down as further device scaling approaches ph
|
||||
The exponential grow in compute energy will eventually be constrained by market dynamics, flattening the energy curve and making it impossible to meet future computing demands.
|
||||
It is therefore required to achieve radical improvements in energy efficiency in order to avoid such a scenario.
|
||||
|
||||
In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} have become very popular, as they provide orders of magnitude higher performance and energy efficiency for \ac{ai} applications \cite{kwon2021}.
|
||||
However, research must also take into account off-chip memory - moving data between the computation unit and the \ac{dram} is very costly, as fetching operands uses consumes more power than performing the computation on them itself.
|
||||
In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} have become very popular, as they provide orders of magnitude higher performance and energy efficiency for \ac{ai} applications than general-purpose processors \cite{kwon2021}.
|
||||
However, research must also take into account off-chip memory - moving data between the computation unit and the \ac{dram} is very costly, as fetching operands consumes more power than performing the computation on them itself.
|
||||
While performing a double precision floating point operation on a $\qty{28}{\nano\meter}$ technology might consume an energy of about $\qty{20}{\pico\joule}$, fetching the operands from \ac{dram} consumes almost 3 orders of magnitude more energy at about $\qty{16}{\nano\joule}$ \cite{dally2010}.
|
||||
|
||||
Furthermore, many types of \ac{dnn} used for language and speech processing, such as \acp{rnn}, \acp{mlp} and some layers of \acp{cnn}, are severely limited by the memory bandwidth that the \ac{dram} can provide, making them \textit{memory-bounded} \cite{he2020}.
|
||||
In contrast, compute-intensive workloads, such as visual processing, are referred to as \textit{compute-bound}.
|
||||
In contrast, compute-intensive workloads, such as visual processing, are referred to as \textit{compute-bounded}.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
@@ -43,10 +43,10 @@ However, recent \ac{ai} technologies require even greater bandwidth than \ac{hbm
|
||||
|
||||
All things considered, to meet the need for more energy-efficient computing systems, which are increasingly becoming memory-bounded, new approaches to computing are required.
|
||||
This has led researchers to reconsider past \ac{pim} architectures and advance them further \cite{lee2021}.
|
||||
\Ac{pim} integrates computational logic into the \ac{dram} itself, to exploit minimal data movement cost and extensive internal data parallelism \cite{sudarshan2022}.
|
||||
\Ac{pim} integrates computational logic into the \ac{dram} itself, to exploit minimal data movement cost and extensive internal data parallelism \cite{sudarshan2022}, making it a good fit for memory-bounded problems.
|
||||
|
||||
This work analyzes various \ac{pim} architectures, identifies the challenges of integrating them into state-of-the-art \acp{dram}, examines the changes required in the way applications lay out their data in memory and explores a \ac{pim} implementation from one of the leading \ac{dram} vendors.
|
||||
The remainder is structured as follows:
|
||||
The remainder of this work is structured as follows:
|
||||
Section \ref{sec:dram} gives a brief overview of the architecture of \acp{dram}, in detail that of \ac{hbm}.
|
||||
In section \ref{sec:pim} various types of \ac{pim} architectures are presented, with some concrete examples discussed in detail.
|
||||
Section \ref{sec:vp} is an introduction to virtual prototyping and system-level hardware simulation.
|
||||
|
||||
Reference in New Issue
Block a user