Update on Overleaf.

2024-03-22 08:44:35 +00:00
parent 550e7c9c80
commit 60ed5de838
1 changed files with 4 additions and 3 deletions
--- a/samplepaper.tex
+++ b/samplepaper.tex
@@ -147,7 +147,7 @@ One real \ac{pim} implementation of the major \ac{dram} manufacturer Samsung, ca
 A special feature of \aca{fimdram} is that it does not require any changes to components of modern processors, such as the memory controller, i.e., it is agnostic to existing \aca{hbm2} platforms.
 Consequently, for the operation of the \acp{pu}, mode switching is required for \aca{fimdram}, which makes it less useful for interleaved \ac{pim} and non-\ac{pim} traffic and small batch sizes.

-At the heart of \aca{fimdram} are the \acp{pu}, where one of which is shared by two banks each of a \ac{pch}.
+At the heart of \aca{fimdram} lie the \acp{pu}, where one of which is shared by two banks of the same \ac{pch}.
 The architecture of such a \ac{pu} is illustrated in \cref{fig:pu}.

 \begin{figure}
@@ -157,7 +157,8 @@ The architecture of such a \ac{pu} is illustrated in \cref{fig:pu}.
    \label{fig:pu}
 \end{figure}

-A \ac{pu} includes 16 16-bit wide \ac{simd} \acp{fpu}, \acp{crf}, \acp{grf} and \acp{srf} \cite{lee2021}.
+A \ac{pu} contains two sets of \acp{fpu}, one for addition and one for multiplication, where each set contains 16 16-bit wide \ac{simd} \acp{fpu} each.
+Besides the \acp{fpu}, a \ac{pu} contains a \ac{crf}, a \ac{grf} and a \ac{srf} \cite{lee2021}.
 The 16-wide \ac{simd} units correspond to the 256-bit prefetch architecture of \aca{hbm2}, where 16 16-bit floating-point operands are passed directly from the \acp{ssa} to the \acp{fpu} from a single memory access.
 As all \ac{pim} units operate in parallel, with 16 banks per \ac{pch}, a singular memory access loads a total of $\qty{256}{\bit}\cdot\qty{8}{\acp{pu}}=\qty{2048}{\bit}$ into the \acp{fpu}.
 As a result, the theoretical internal bandwidth of \aca{fimdram} is $\qty{8}{\times}$ higher than the external bus bandwidth to the host processor.
@@ -174,7 +175,7 @@ Both in \ac{ab} mode and in \ac{abp} mode, the total \aca{hbm2} bandwidth per \a

 Due to the focus on \ac{dnn} applications in \aca{fimdram}, the native data type for the \acp{fpu} is \ac{fp16}, which is motivated by the significantly lower area and power requirements for \acp{fpu} compared to 32-bit \ac{fp} numbers.
 The \ac{simd} \acp{fpu} of the processing units is implemented once as a \ac{fp16} multiplier unit, and once as a \ac{fp16} adder unit, providing support for these basic algorithmic operations.
-In addition to the \acp{fpu}, a processing unit consists also of \acp{crf}, \acp{srf} and \acp{grf}.
+
 The \ac{crf} acts as an instruction buffer, holding the 32 32-bit instructions to be executed by the processor when performing a memory access.
 One program that is stored in the \ac{crf} is called a \textit{microkernel}.
 Each \ac{grf} consists of 16 registers, each with the \aca{hbm2} prefetch size of 256 bits, where each entry can hold the data of a full memory burst.