Samsung comparison

This commit is contained in:
2024-03-01 19:44:43 +01:00
parent 47796cdae5
commit 30db51a8de
2 changed files with 64 additions and 6 deletions

View File

@@ -390,6 +390,7 @@ Therefore, it is necessary to execute the entire \ac{gemv} microkernel several t
In general, the more the dimensions exceed the native \ac{pim} matrix dimensions, the more often the \ac{mac} core of the \ac{gemv} microkernel must be executed.
\subsubsection{Performance and Power Efficiency Effects}
\label{sec:fimdram_performance}
In addition to the theoretical bandwidth that is provided to the \ac{pim} units of $\qty[per-mode=symbol]{128}{\giga\byte\per\second}$ or a total of $\qty[per-mode=symbol]{2}{\tera\byte\per\second}$ for 16 \acp{pch}, Samsung also ran experiments on a real implementation of \aca{fimdram} to analyze its performance gains and power efficiency improvements.
This real system is based on a Xilinx Zynq Ultrascale+ \ac{fpga} that is integrated onto the same silicon interposer as four \aca{hbm} stacks, with each consisting of one buffer die, four \aca{fimdram} dies and four normal \aca{hbm} dies \cite{lee2021}.