Samsung comparison
This commit is contained in:
@@ -390,6 +390,7 @@ Therefore, it is necessary to execute the entire \ac{gemv} microkernel several t
|
||||
In general, the more the dimensions exceed the native \ac{pim} matrix dimensions, the more often the \ac{mac} core of the \ac{gemv} microkernel must be executed.
|
||||
|
||||
\subsubsection{Performance and Power Efficiency Effects}
|
||||
\label{sec:fimdram_performance}
|
||||
|
||||
In addition to the theoretical bandwidth that is provided to the \ac{pim} units of $\qty[per-mode=symbol]{128}{\giga\byte\per\second}$ or a total of $\qty[per-mode=symbol]{2}{\tera\byte\per\second}$ for 16 \acp{pch}, Samsung also ran experiments on a real implementation of \aca{fimdram} to analyze its performance gains and power efficiency improvements.
|
||||
This real system is based on a Xilinx Zynq Ultrascale+ \ac{fpga} that is integrated onto the same silicon interposer as four \aca{hbm} stacks, with each consisting of one buffer die, four \aca{fimdram} dies and four normal \aca{hbm} dies \cite{lee2021}.
|
||||
|
||||
Reference in New Issue
Block a user