diff --git a/samplepaper.tex b/samplepaper.tex index ac4ca69..b5ce7ef 100644 --- a/samplepaper.tex +++ b/samplepaper.tex @@ -157,7 +157,7 @@ The architecture of such a \ac{pu} is illustrated in \cref{fig:pu}. \label{fig:pu} \end{figure} -A \ac{pu} contains two sets of \acp{fpu}, one for addition and one for multiplication, where each set contains 16 16-bit wide \ac{simd} \acp{fpu} each. +A \ac{pu} contains two sets of \ac{simd} \acp{fpu}, one for addition and one for multiplication, where each set contains 16 16-bit wide \acp{fpu} each. Besides the \acp{fpu}, a \ac{pu} contains a \ac{crf}, a \ac{grf} and a \ac{srf} \cite{lee2021}. The 16-wide \ac{simd} units correspond to the 256-bit prefetch architecture of \aca{hbm2}, where 16 16-bit floating-point operands are passed directly from the \acp{ssa} to the \acp{fpu} from a single memory access. As all \ac{pim} units operate in parallel, with 16 banks per \ac{pch}, a singular memory access loads a total of $\qty{256}{\bit}\cdot\qty{8}{\acp{pu}}=\qty{2048}{\bit}$ into the \acp{fpu}. @@ -177,7 +177,7 @@ Due to the focus on \ac{dnn} applications in \aca{fimdram}, the native data type The \ac{simd} \acp{fpu} of the processing units is implemented once as a \ac{fp16} multiplier unit, and once as a \ac{fp16} adder unit, providing support for these basic algorithmic operations. The \ac{crf} acts as an instruction buffer, holding the 32 32-bit instructions to be executed by the processor when performing a memory access. -One program that is stored in the \ac{crf} is called a \textit{microkernel}. +A program that is stored in the \ac{crf} is called a \textit{microkernel}. Each \ac{grf} consists of 16 registers, each with the \aca{hbm2} prefetch size of 256 bits, where each entry can hold the data of a full memory burst. The \ac{grf} of a processing unit is divided into two halves (\ac{grf}-A and \ac{grf}-B), with eight register entries allocated to each of the two banks. Finally, in the \acp{srf}, a 16-bit scalar value is replicated $\qty{16}{\times}$ as it is fed into the 16-wide \ac{simd} \ac{fpu} as a constant summand or factor for an addition or multiplication.