Conclusion erste Version

This commit is contained in:
2024-02-17 10:58:47 +01:00
parent 0bdbd2ddc9
commit 779461b515
4 changed files with 33 additions and 6 deletions

View File

@@ -6,12 +6,12 @@ A key component of these models is the use of \acp{dnn}, which are a type of mac
Consequently, \acp{dnn} make it possible to tackle many new classes of problems that were previously beyond the reach of conventional algorithms.
However, the ever-increasing use of these technologies poses new challenges for hardware architectures, as the energy required to train and run these models reaches unprecedented levels.
Recently published numbers approximate that the development and training of Meta's LLaMA model over a period of about 5 months consumed around $\qty{2638}{\mega\watt\hour}$ of electrical energy and caused a total emission of $\qty{1015}{tCO_2eq}$ \cite{touvron2023}.
Recently published numbers approximate that the development and training of Meta's LLaMA model over a period of about five months consumed around $\qty{2638}{\mega\watt\hour}$ of electrical energy and caused a total emission of $\qty{1015}{tCO_2eq}$ \cite{touvron2023}.
As these numbers are expected to increase in the future, it is clear that the energy footprint of current deployment of \ac{ai} applications is not sustainable \cite{blott2023}.
In a more general view, the energy demand of computing for new applications continues to grow exponentially, doubling about every two years, while the world's energy production only grows linearly, at about $\qty{2}{\percent}$ per year \cite{src2021}.
This dramatic increase in energy consumption is due to the fact that while the energy efficiency of compute processor units has continued to improve, the ever-increasing demand for computing is outpacing this progress.
This dramatic increase in energy consumption is due to the fact that while the energy efficiency of compute processor units has continued to improve, the ever-increasing demand for computing however is outpacing this progress.
In addition, Moore's Law is slowing down as further device scaling approaches physical limits.
\begin{figure}[!ht]
@@ -28,7 +28,7 @@ In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} ha
However, research must also take into account off-chip memory - moving data between the computation unit and the \ac{dram} is very costly, as fetching operands consumes more power than performing the computation on them itself.
While performing a double precision floating point operation on a $\qty{28}{\nano\meter}$ technology might consume an energy of about $\qty{20}{\pico\joule}$, fetching the operands from \ac{dram} consumes almost 3 orders of magnitude more energy at about $\qty{16}{\nano\joule}$ \cite{dally2010}.
Furthermore, many types of \ac{dnn} used for language and speech processing, such as \acp{rnn}, \acp{mlp} and some layers of \acp{cnn}, are severely limited by the memory bandwidth that the \ac{dram} can provide, making them \textit{memory-bounded} \cite{he2020}.
Furthermore, many types of \acp{dnn} used for language and speech processing, such as \acp{rnn}, \acp{mlp} and some layers of \acp{cnn}, are severely limited by the memory bandwidth that the \ac{dram} can provide, making them \textit{memory-bounded} \cite{he2020}.
In contrast, compute-intensive workloads, such as visual processing, are referred to as \textit{compute-bounded}.
\begin{figure}[!ht]
@@ -51,5 +51,5 @@ The remainder of this work is structured as follows:
In \cref{sec:pim} various types of \ac{pim} architectures are presented, with some concrete examples discussed in detail.
\cref{sec:vp} is an introduction to virtual prototyping and system-level hardware simulation.
After explaining the necessary prerequisites, \cref{sec:implementation} implements a concrete \ac{pim} architecture in software and provides a development library that applications can use to take advantage of in-memory processing.
The \cref{sec:results} demonstrates the possible performance enhancement of \ac{pim} by simulating a typical neural-network inference.
The \cref{sec:results} demonstrates the possible performance enhancement of \ac{pim} by simulating a typical neural network inference.
Finally, \cref{sec:conclusion} concludes the findings and identifies future improvements in \ac{pim} architectures.