Update on Overleaf.
This commit is contained in:
@@ -44,10 +44,13 @@
|
||||
% an abbreviated paper title here
|
||||
%
|
||||
\author{%
|
||||
Derek Christ\inst{1}\orcidID{0000-1111-2222-3333} \and
|
||||
Lukas Steiner\inst{2}\orcidID{1111-2222-3333-4444} \and
|
||||
Matthias Jung\inst{1,3}\orcidID{2222--3333-4444-5555} \and
|
||||
Norbert Wehn\inst{2}\orcidID{2222--3333-4444-5555}
|
||||
Derek Christ\inst{1}%\orcidID{0000-1111-2222-3333}
|
||||
\and
|
||||
Lukas Steiner\inst{2}%\orcidID{1111-2222-3333-4444}
|
||||
\and
|
||||
Matthias Jung\inst{1,3}%\orcidID{2222--3333-4444-5555}
|
||||
\and
|
||||
Norbert Wehn\inst{2}%\orcidID{2222--3333-4444-5555}
|
||||
}
|
||||
%
|
||||
\authorrunning{D. Christ et al.}
|
||||
@@ -90,14 +93,14 @@ With these new architectures on the horizon, it becomes crucial for system-level
|
||||
|
||||
This paper introduces a virtual prototype of Samsung's PIM-HBM, developed using open-source tools such as gem5~\cite{lowahm_20} and the memory simulator DRAMSys~\cite{stejun_20}. Additionally, the virtual prototype is accompanied by a custom Rust software library, simplifying the utilization of PIM functionality at the software level.
|
||||
|
||||
In summary this paper makes the following contributions:
|
||||
In summary, this paper makes the following contributions:
|
||||
\begin{itemize}
|
||||
\item First time Full System Simulation of HBM-PIM with a virtual plattform consisting of gem5 and DRAMSys
|
||||
\item Experimantal verification of VP with Benchmarks
|
||||
\item A Rust library to provide the PIM functionality up to the software level
|
||||
\end{itemize}
|
||||
|
||||
The paper is structured as follows ...
|
||||
The paper is structured as follows. Section 2 Shows the realted work in the area of PIM-Simulation. Section 3 gives a brief background on the relative PIM-Architectures, whereas Section 4 explains the proposed PIM Virtual Plattform. Chapter 5 and 6 show experimental simulation setup and the results, which are compared them with already published results from PIM vendors. The paper is concluded in Section 7.
|
||||
%
|
||||
\section{Related Work}
|
||||
% Onur Ramulator ?
|
||||
@@ -105,7 +108,7 @@ To analyze the potential performance and power impact of Newton, SK Hynix develo
|
||||
The simulated system is compared to two different non-\ac{pim} systems: an ideal non-\ac{pim} host with infinite compute bandwidth and a \ac{gpu} model of a high-end Titan-V graphics card using a cycle-accurate \ac{gpu} simulator.
|
||||
SK Hynix finds that Newton achieves a \qty{54}{\times} speedup over the Titan-V \ac{gpu} model and a speedup of \qty{10}{\times} for the ideal non-\ac{pim} case, setting a lower bound on the acceleration for every possible non-\ac{pim} architecture.
|
||||
|
||||
With the \textbf{PIMSimulator} \cite{shin-haengkang2023}, Samsung provides a virtual prototype of \ac{fimdram} also based on DRAMSim2.
|
||||
With the \textbf{PIMSimulator}~\cite{shin-haengkang2023}, Samsung provides a virtual prototype of \ac{fimdram} also based on DRAMSim2.
|
||||
PIMSimulator offers two simulation modes: it can either accept pre-recorded memory traces or generate very simplified memory traffic using a minimal host processor model that essentially executes only the \ac{pim}-related program regions.
|
||||
However, neither approach accurately models a complete system consisting of a host processor running a real compiled binary and the memory system that integrates \ac{fimdram}.
|
||||
As a result, only limited conclusions can be made about the performance impact of \ac{fimdram} and the changes that are required in the application code to support the new architecture.
|
||||
@@ -172,7 +175,7 @@ In the \ac{aam} mode, the register indices of an instruction are ignored and dec
|
||||
With this method, the register indices and the bank address cannot get out of sync, as they are tightly coupled, even if the memory controller reorders the order of the accesses.
|
||||
|
||||
|
||||
\section{VP}
|
||||
\section{PIM Virtual Plattform}
|
||||
To build a virtual prototype of \aca{fimdram}, an accurate \ac{hbm2} model is needed, where the additional \ac{pim}-\acp{pu} are integrated.
|
||||
For this the cycle-accurate \ac{dram} simulator DRAMSys \cite{steiner2022a} has been used and its \ac{hbm2} model extended to incorporate the \acp{pu} into the \acp{pch} of the \ac{pim}-activated channels.
|
||||
The \aca{fimdram} model itself does not need to model any timing behavior:
|
||||
|
||||
Reference in New Issue
Block a user