Add abstract
This commit is contained in:
@@ -1,12 +1,22 @@
|
|||||||
\begin{abstract}
|
\begin{abstract}
|
||||||
\section*{Abstract}
|
\section*{Abstract}
|
||||||
|
|
||||||
|
In our increasingly data-oriented world, machine learning applications such as \acp*{llm} for language processing, \acp*{cnn} for image recognition or \acp*{rnn} for processing sequential data are becoming more and more important.
|
||||||
|
An important component of these new systems are \acp*{dnn}.
|
||||||
|
Specialized processors such as \acp*{gpu} or \acp*{tpu} were used in the past to accelerate the operation of such \acsp*{dnn}.
|
||||||
|
However, it has become apparent that the performance of \acsp*{dnn} is increasingly limited less by the computing power provided, but rather by the limited memory bandwidth of the \acp*{dram}.
|
||||||
|
One possible solution to this problem is the use of \ac*{pim}, i.e. the processing of data directly in memory.
|
||||||
|
This paper examines which applications are suitable for the use of \acs*{pim} and what effects on performance can be expected.
|
||||||
|
|
||||||
\vspace{1.0cm}
|
\vspace{1.0cm}
|
||||||
|
|
||||||
\section*{Zusammenfassung}
|
\section*{Zusammenfassung}
|
||||||
|
|
||||||
|
In unserer zunehmend datenorientierten Welt gewinnen Anwendungen des maschinellen Lernens wie \acp*{llm} zur Verarbeitung von Sprache, \acp*{cnn} zur Bilderkennung oder \acp*{rnn} zur Verarbeitung von sequenziellen Daten an immer größerer Bedeutung.
|
||||||
|
Ein wichtiger Bestandteil dieser neuen Systeme sind \acp*{dnn}.
|
||||||
|
Zur Beschleunigung der Berechnung solcher \acsp*{dnn} wurden in der Vergangenheit spezialisierte Prozessoren wie \acp*{gpu} oder \acp*{tpu} eingesetzt.
|
||||||
|
Es zeigt sich allerdings, dass die Leistung von \acsp*{dnn} zunehmend weniger durch die bereitgestellte Rechenleistung begrenzt wird, sondern vielmehr durch die begrenzte Speicherbandbreite des \acp*{dram}.
|
||||||
|
Eine mögliche Lösung für dieses Problem ist die Nutzung von \ac*{pim}, also die Verarbeitung von Daten direkt im Speicher.
|
||||||
|
In dieser Arbeit wird untersucht, welche Anwendungen sich für die Nutzung von \acs*{pim} eignen und welche Auswirkungen auf die Leistung zu erwarten sind.
|
||||||
|
|
||||||
\end{abstract}
|
\end{abstract}
|
||||||
|
|||||||
@@ -1,2 +1,7 @@
|
|||||||
\section{Appendix}
|
\section{Appendix}
|
||||||
\label{sec:appendix}
|
\label{sec:appendix}
|
||||||
|
|
||||||
|
% etwas source code,
|
||||||
|
% von der vm
|
||||||
|
% einige microkernels
|
||||||
|
% ...
|
||||||
|
|||||||
@@ -388,7 +388,7 @@ To increase the number of columns, new entries of the input vector must be loade
|
|||||||
Therefore, it is necessary to execute the complete \ac{gemv} microkernel several times the different input vector chunks and weight matrix columns.
|
Therefore, it is necessary to execute the complete \ac{gemv} microkernel several times the different input vector chunks and weight matrix columns.
|
||||||
In general, the more the dimensions exceed the native \ac{pim} matrix dimensions, the more often the \ac{mac} core of the \ac{gemv} microkernel must be executed.
|
In general, the more the dimensions exceed the native \ac{pim} matrix dimensions, the more often the \ac{mac} core of the \ac{gemv} microkernel must be executed.
|
||||||
|
|
||||||
\subsubsection{Performance and Power Efficiency Achievements}
|
\subsubsection{Performance and Power Efficiency Effects}
|
||||||
|
|
||||||
In addition to the theoretical bandwidth that is provided to the \ac{pim} units of $\qty[per-mode=symbol]{128}{\giga\byte\per\second}$ or a total of $\qty[per-mode=symbol]{2}{\tera\byte\per\second}$ for 16 \acp{pch}, Samsung also ran experiments on a real implementation of \aca{fimdram} to analyze its performance gains and power efficiency improvements.
|
In addition to the theoretical bandwidth that is provided to the \ac{pim} units of $\qty[per-mode=symbol]{128}{\giga\byte\per\second}$ or a total of $\qty[per-mode=symbol]{2}{\tera\byte\per\second}$ for 16 \acp{pch}, Samsung also ran experiments on a real implementation of \aca{fimdram} to analyze its performance gains and power efficiency improvements.
|
||||||
This real system is based on a Xilinx Zynq Ultrascale+ \ac{fpga} that lies on the same silicon interposer as four \aca{hbm} stacks with each one buffer die, four \aca{fimdram} dies and four normal \aca{hbm} dies \cite{lee2021}.
|
This real system is based on a Xilinx Zynq Ultrascale+ \ac{fpga} that lies on the same silicon interposer as four \aca{hbm} stacks with each one buffer die, four \aca{fimdram} dies and four normal \aca{hbm} dies \cite{lee2021}.
|
||||||
|
|||||||
Reference in New Issue
Block a user