Update on Overleaf.

This commit is contained in:
2024-03-22 08:51:10 +00:00
committed by node
parent 60ed5de838
commit e1e7ad750e

View File

@@ -157,7 +157,7 @@ The architecture of such a \ac{pu} is illustrated in \cref{fig:pu}.
\label{fig:pu}
\end{figure}
A \ac{pu} contains two sets of \acp{fpu}, one for addition and one for multiplication, where each set contains 16 16-bit wide \ac{simd} \acp{fpu} each.
A \ac{pu} contains two sets of \ac{simd} \acp{fpu}, one for addition and one for multiplication, where each set contains 16 16-bit wide \acp{fpu} each.
Besides the \acp{fpu}, a \ac{pu} contains a \ac{crf}, a \ac{grf} and a \ac{srf} \cite{lee2021}.
The 16-wide \ac{simd} units correspond to the 256-bit prefetch architecture of \aca{hbm2}, where 16 16-bit floating-point operands are passed directly from the \acp{ssa} to the \acp{fpu} from a single memory access.
As all \ac{pim} units operate in parallel, with 16 banks per \ac{pch}, a singular memory access loads a total of $\qty{256}{\bit}\cdot\qty{8}{\acp{pu}}=\qty{2048}{\bit}$ into the \acp{fpu}.
@@ -177,7 +177,7 @@ Due to the focus on \ac{dnn} applications in \aca{fimdram}, the native data type
The \ac{simd} \acp{fpu} of the processing units is implemented once as a \ac{fp16} multiplier unit, and once as a \ac{fp16} adder unit, providing support for these basic algorithmic operations.
The \ac{crf} acts as an instruction buffer, holding the 32 32-bit instructions to be executed by the processor when performing a memory access.
One program that is stored in the \ac{crf} is called a \textit{microkernel}.
A program that is stored in the \ac{crf} is called a \textit{microkernel}.
Each \ac{grf} consists of 16 registers, each with the \aca{hbm2} prefetch size of 256 bits, where each entry can hold the data of a full memory burst.
The \ac{grf} of a processing unit is divided into two halves (\ac{grf}-A and \ac{grf}-B), with eight register entries allocated to each of the two banks.
Finally, in the \acp{srf}, a 16-bit scalar value is replicated $\qty{16}{\times}$ as it is fed into the 16-wide \ac{simd} \ac{fpu} as a constant summand or factor for an addition or multiplication.