Add abstract
This commit is contained in:
@@ -388,7 +388,7 @@ To increase the number of columns, new entries of the input vector must be loade
|
||||
Therefore, it is necessary to execute the complete \ac{gemv} microkernel several times the different input vector chunks and weight matrix columns.
|
||||
In general, the more the dimensions exceed the native \ac{pim} matrix dimensions, the more often the \ac{mac} core of the \ac{gemv} microkernel must be executed.
|
||||
|
||||
\subsubsection{Performance and Power Efficiency Achievements}
|
||||
\subsubsection{Performance and Power Efficiency Effects}
|
||||
|
||||
In addition to the theoretical bandwidth that is provided to the \ac{pim} units of $\qty[per-mode=symbol]{128}{\giga\byte\per\second}$ or a total of $\qty[per-mode=symbol]{2}{\tera\byte\per\second}$ for 16 \acp{pch}, Samsung also ran experiments on a real implementation of \aca{fimdram} to analyze its performance gains and power efficiency improvements.
|
||||
This real system is based on a Xilinx Zynq Ultrascale+ \ac{fpga} that lies on the same silicon interposer as four \aca{hbm} stacks with each one buffer die, four \aca{fimdram} dies and four normal \aca{hbm} dies \cite{lee2021}.
|
||||
|
||||
Reference in New Issue
Block a user