Update on Overleaf.
This commit is contained in:
@@ -157,7 +157,7 @@ The architecture of such a \ac{pu} is illustrated in \cref{fig:pu}.
|
||||
\label{fig:pu}
|
||||
\end{figure}
|
||||
|
||||
A \ac{pu} contains two sets of \acp{fpu}, one for addition and one for multiplication, where each set contains 16 16-bit wide \ac{simd} \acp{fpu} each.
|
||||
A \ac{pu} contains two sets of \ac{simd} \acp{fpu}, one for addition and one for multiplication, where each set contains 16 16-bit wide \acp{fpu} each.
|
||||
Besides the \acp{fpu}, a \ac{pu} contains a \ac{crf}, a \ac{grf} and a \ac{srf} \cite{lee2021}.
|
||||
The 16-wide \ac{simd} units correspond to the 256-bit prefetch architecture of \aca{hbm2}, where 16 16-bit floating-point operands are passed directly from the \acp{ssa} to the \acp{fpu} from a single memory access.
|
||||
As all \ac{pim} units operate in parallel, with 16 banks per \ac{pch}, a singular memory access loads a total of $\qty{256}{\bit}\cdot\qty{8}{\acp{pu}}=\qty{2048}{\bit}$ into the \acp{fpu}.
|
||||
@@ -177,7 +177,7 @@ Due to the focus on \ac{dnn} applications in \aca{fimdram}, the native data type
|
||||
The \ac{simd} \acp{fpu} of the processing units is implemented once as a \ac{fp16} multiplier unit, and once as a \ac{fp16} adder unit, providing support for these basic algorithmic operations.
|
||||
|
||||
The \ac{crf} acts as an instruction buffer, holding the 32 32-bit instructions to be executed by the processor when performing a memory access.
|
||||
One program that is stored in the \ac{crf} is called a \textit{microkernel}.
|
||||
A program that is stored in the \ac{crf} is called a \textit{microkernel}.
|
||||
Each \ac{grf} consists of 16 registers, each with the \aca{hbm2} prefetch size of 256 bits, where each entry can hold the data of a full memory burst.
|
||||
The \ac{grf} of a processing unit is divided into two halves (\ac{grf}-A and \ac{grf}-B), with eight register entries allocated to each of the two banks.
|
||||
Finally, in the \acp{srf}, a 16-bit scalar value is replicated $\qty{16}{\times}$ as it is fed into the 16-wide \ac{simd} \ac{fpu} as a constant summand or factor for an addition or multiplication.
|
||||
|
||||
Reference in New Issue
Block a user