First revision of introduction

2024-01-24 19:20:09 +01:00
parent 28c8dc299c
commit 9aa94e4f0f
7 changed files with 203 additions and 55 deletions
--- a/src/chapters/introduction.tex
+++ b/src/chapters/introduction.tex
@@ -6,18 +6,50 @@ An important compound of these models make use of \acp{dnn}, which are a type of
 Consequently, \acp{dnn} make it possible to tackle many new classes of problems that were previously beyond the reach of conventional algorithms.

 However, the ever-increasing use of these technologies poses new challenges for hardware architectures, as the energy required to train and run these models reaches unprecedented levels.
-Recently published numbers approximate that the development and training of Meta's LLaMA model over a period of about 5 months consumed around $\qty{2638}{\mega\watt\hour}$ of electrical energy and caused a total emission of $\qty{1015}{tCO_2eq}$ \cite{touvron2023}. 
-As these numbers are expected to increase in the future, it is clear that the energy footprint current deployment of artificial intelligence applications is not sustainable \cite{blott2023}.
-In a more general view, the energy demand of computing for new applications continues to grow exponentially, doubling about every two years, while the world's energy production grows only linearly, at about $\qty{2}{\percent}$ per year \cite{src2021}.
+Recently published numbers approximate that the development and training of Meta's LLaMA model over a period of about 5 months consumed around $\qty{2638}{\mega\watt\hour}$ of electrical energy and caused a total emission of $\qty{1015}{tCO_2eq}$ \cite{touvron2023}.
+As these numbers are expected to increase in the future, it is clear that the energy footprint of current deployment of \ac{ai} applications is not sustainable \cite{blott2023}.
+
+
+In a more general view, the energy demand of computing for new applications continues to grow exponentially, doubling about every two years, while the world's energy production only grows linearly, at about $\qty{2}{\percent}$ per year \cite{src2021}.
 This dramatic increase in energy consumption is due to the fact that while the energy efficiency of compute processor units has continued to improve, the ever-increasing demand for computing is outpacing this progress.
 In addition, Moore's Law is slowing down as further device scaling approaches physical limits.

-% TODO move in correct directory
-\input{chapters/energy_chart}
+\begin{figure}[!ht]
+	\centering
+	\input{plots/energy_chart}
+	\caption[Total energy of computing]{Total energy of computing \cite{src2021}}
+	\label{plt:enery_chart}
+\end{figure}

 The exponential grow in compute energy will eventually be constrained by market dynamics, flattening the energy curve and making it impossible to meet future computing demands.
-It is therefore required to achieve radical improvements in energy efficiency to avoid this scenario.
+It is therefore required to achieve radical improvements in energy efficiency in order to avoid such a scenario.

-% -> effizierntere systeme
-% diskussion bezieht sich vor allem auf prozessoren
-% -> muss vor allem memory beachten, movement cost diagram
+In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} have become very popular, as they provide orders of magnitude higher performance and energy efficiency for \ac{ai} applications \cite{kwon2021}.
+However, research must also consider the off-chip memory - the date movement between the computation unit and the \ac{dram} has a high cost as fetching operands costs more than doing the computation on them.
+While performing a double precision floating point operation on a $\qty{28}{\nano\meter}$ technology might consume an energy of about $\qty{20}{\pico\joule}$, fetching the operands from \ac{dram} consumes almost 3 orders of magnitude more energy at about $\qty{16}{\nano\joule}$ \cite{dally2010}.
+
+Furthermore, many types of \ac{dnn} used for language and speech processing such as \acp{rnn}, \acp{mlp} and some layers of \acp{cnn} are severely limited by the memory-bandwidth that the \ac{dram} can provide, in contrast to compute-intensive workloads such as visual processing \cite{he2020}.
+Such workloads are referred to as \textit{memory-bound}.
+
+\begin{figure}[!ht]
+	\centering
+	\input{plots/roofline}
+	\caption[Roofline model of GPT revisions]{Roofline model of GPT revisions \cite{ivobolsens2023}}
+	\label{plt:roofline}
+\end{figure}
+
+In the past, specialized types of \ac{dram} such as \ac{hbm} have been able to meet high bandwidth requirements.
+However, recent AI technologies require even greater bandwidth than \ac{hbm} can provide \cite{kwon2021}.
+
+All things considered, to meet the need for energy-efficient computing systems, which are increasingly becoming memory-bound, new approaches to computing are required.
+This has led researchers to reconsider past \ac{pim} architectures and advance them further \cite{lee2021}.
+\Ac{pim} integrates computational logic into the \ac{dram} itself, to exploit minimal data movement cost and extensive internal data parallelism \cite{sudarshan2022}.
+
+This work analyzes various \ac{pim} architectures, identifies the challenges of integrating them into state-of-the-art \acp{dram}, examines the changes required in the way applications lay out their data in memory and explores a \ac{pim} implementation from one of the leading \ac{dram} vendors.
+The remainder of it is structured as follows:
+Section \ref{sec:dram} gives a brief overview of the architecture of \acp{dram}, in detail that of \acp{hbm}.
+In section \ref{sec:pim} various types of \ac{pim} architectures are presented, with some concrete examples discussed in detail.
+Section \ref{sec:vp} is an introduction to virtual prototyping and system-level hardware simulation.
+After explaining the necessary prerequisites, section \ref{sec:implementation} implements a concrete \ac{pim} architecture in software and provides a development library that applications can use to take advantage of in-memory processing.
+The section \ref{sec:results} demonstrates the possible performance enhancement of \ac{pim} by simulating a typical neural-network inference.
+Finally, section \ref{sec:conclusion} concludes the findings and identifies future improvements in \ac{pim} architectures.