Update on Overleaf.
This commit is contained in:
@@ -1112,7 +1112,7 @@ Alternatively, a DRAM command trace can be provided as an input file.
|
|||||||
For the interface power calculation, the provided commands, addresses and data are translated into equivalent bit patterns using the command truth table of the simulated standard.
|
For the interface power calculation, the provided commands, addresses and data are translated into equivalent bit patterns using the command truth table of the simulated standard.
|
||||||
Based on this data, the number of transmitted zeros $n_0$, transmitted ones $n_1$ and zero to one transitions $n_{0 \rightarrow 1}$ can be calculated.
|
Based on this data, the number of transmitted zeros $n_0$, transmitted ones $n_1$ and zero to one transitions $n_{0 \rightarrow 1}$ can be calculated.
|
||||||
To achieve high simulation speeds, bit manipulation instructions including the population count (\texttt{POPCNT}) instruction are used.
|
To achieve high simulation speeds, bit manipulation instructions including the population count (\texttt{POPCNT}) instruction are used.
|
||||||
If no data is provided, a switching activity $\alpha$ and a ratio between both logic levels \todo{name} has to be provided.
|
If no data is provided, a switching activity $\alpha$ and a duty cycle $D$ has to be provided.
|
||||||
In addition to the command/address and data bus, the remaining signals like the clock signal pair, data strobe pairs or chip select need to be considered (see Section~\ref{subsec:background_interface}).
|
In addition to the command/address and data bus, the remaining signals like the clock signal pair, data strobe pairs or chip select need to be considered (see Section~\ref{subsec:background_interface}).
|
||||||
%As explained in Section~\ref{sec:interface_power_modeling}, the interface power calculation can depend on lots of parameters and, thus, can become very complex.
|
%As explained in Section~\ref{sec:interface_power_modeling}, the interface power calculation can depend on lots of parameters and, thus, can become very complex.
|
||||||
%\todo{In order to avoid the complexity within DRAMPower, the tool only receives the precalculated termination and dynamic power values for all signals as inputs.}
|
%\todo{In order to avoid the complexity within DRAMPower, the tool only receives the precalculated termination and dynamic power values for all signals as inputs.}
|
||||||
@@ -1197,12 +1197,19 @@ The total power consumption can be queried at any time even when the simulation
|
|||||||
\subsection{Simulation Speed}
|
\subsection{Simulation Speed}
|
||||||
%
|
%
|
||||||
Since DRAMPower is not used as a standalone tool in the normal use case, but rather coupled to a behavioral DRAM subsystem simulator, we evaluate its simulation speed in terms of the overhead of adding power simulation.
|
Since DRAMPower is not used as a standalone tool in the normal use case, but rather coupled to a behavioral DRAM subsystem simulator, we evaluate its simulation speed in terms of the overhead of adding power simulation.
|
||||||
For this analysis, DRAMPower is coupled to the well-known DRAM subsystem simulator DRAMSys~\cite{}
|
For this analysis, DRAMPower is coupled to the well-known DRAM subsystem simulator DRAMSys~\cite{stejun_20}.
|
||||||
|
Within DRAMSys, \todo{one million read and write requests with random addresses and random data are generated.}
|
||||||
|
This simulation is carried out both with and without power simulation enabled.
|
||||||
As DRAMPower is not intended as a standalone simulator, we evaluate the simulation speed
|
Moreover, the simulations are also performed without actual data.
|
||||||
|
In this case, DRAMPower is provided with a switching activity $\alpha$ and a duty cycle $D$.
|
||||||
|
|
||||||
|
For the simulations with data, DRAMSys alone requires 896\,ms to finish, while with added power simulation,
|
||||||
|
it takes 1004\,ms to finish.
|
||||||
|
This corresponds to an overhead of 12\,\%.
|
||||||
|
When no data is simulated, DRAMSys alone requires only 559\,ms to finish, while with DRAMPower enabled, the simulation time increases to 774\,ms.
|
||||||
|
In this case, the overhead is 38\,\%.
|
||||||
|
While this overhead is relatively large, there are two things to consider.
|
||||||
|
First, DRAMSys is highly optimized for simulation speed and outperforms all other simulators
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
@@ -1250,36 +1257,34 @@ alternatively, duty cycle/toggling rates can be used
|
|||||||
%\input{content/05_exp_results}
|
%\input{content/05_exp_results}
|
||||||
\subsection{Simulation Accuracy}
|
\subsection{Simulation Accuracy}
|
||||||
%
|
%
|
||||||
|
\todo{
|
||||||
Interface -> comparison with SPICE, maybe use a random pattern in spice with fixed n0, n1 and alpha
|
Interface -> comparison with SPICE, maybe use a random pattern in spice with fixed n0, n1 and alpha
|
||||||
Core -> we do not yet have a measurement platform for DDR5/LPDDR5/HBM3... where we can issue specific command patterns to DRAM and compare it with the results provided by DRAMPower.
|
Core -> we do not yet have a measurement platform for DDR5/LPDDR5/HBM3... where we can issue specific command patterns to DRAM and compare it with the results provided by DRAMPower.
|
||||||
\todo{Marco, Derek}
|
}
|
||||||
% IDD Patterns mit Daimler Messung vergleichen
|
% IDD Patterns mit Daimler Messung vergleichen
|
||||||
To verify the power estimates of the new DRAMPower implementation, we use measurement data from DRAMs of three different vendors, as reported in a real LPDDR4 memory measurement platform study~\cite{feldmann_23}.
|
To verify the power estimates of the new DRAMPower implementation, we use core and interface power measurements of DRAMs from three different vendors, as reported in a study of a real LPDDR4 memory measurement platform~\cite{feldmann_23}.
|
||||||
Each DRAM is operated with six different access patterns, which are analogous to the following $I_{DD}$ currents:
|
Each DRAM is operated with six different access patterns, which are analogous to the following $I_{DD}$ currents:
|
||||||
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 1}}~$I_{DD}0$*,
|
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 1}}~$I_{DD0*}$,
|
||||||
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 2}}~$I_{DD}4R$,
|
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 2}}~$I_{DD4R}$,
|
||||||
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 3}}~$I_{DD}4W$,
|
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 3}}~$I_{DD4W}$,
|
||||||
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 4}}~$I_{DD}5AB$,
|
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 4}}~$I_{DD5B}$,
|
||||||
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 5}}~$I_{DD}2N$ and
|
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 5}}~$I_{DD2N}$ and
|
||||||
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 6}}~$I_{DD}6$.
|
\tikz{\node[circle,draw,inner sep=1pt] {\tiny 6}}~$I_{DD6}$.
|
||||||
As it was not possible to reproduce the usual $I_{DD}0$ pattern of ACT-PRE for the measurement platform, $I_{DD}0$* is a variation using the pattern ACT-RD-PRE, which is also resembled in the DRAMPower simulation.
|
As it was not possible to reproduce the usual $I_{DD0}$ pattern of ACT-PRE for the measurement platform, $I_{DD0*}$ is a variation using the ACT-RD-PRE pattern, which is also resembled in the DRAMPower simulation.
|
||||||
Also, the measurement platform was not able to accurately measure the write current $I_{DD}4W$
|
In addition, the measurement platform could not accurately measure the write current $I_{DD4W}$ because only one write request could be issued at a time, the simulation was also configured to limit the number of outstanding write requests to one.
|
||||||
The initial simulations are based on the current values specified in the datasheet of the specific vendor.
|
The initial simulations are based on the current values specified in the datasheet of the specific vendor.
|
||||||
Then, based on the actual measurements, the current values are reapplied to a second simulation.
|
Then, based on the actual measurements, the current values are reapplied to a second simulation.
|
||||||
The results are shown in Figure~\ref{fig:power_plot}.
|
The results are shown in Figure~\ref{fig:power_plot}.
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
% \resizebox{\linewidth}{!}{%
|
|
||||||
\input{img/power_plot}
|
\input{img/power_plot}
|
||||||
% }
|
|
||||||
\caption{Average Power Consumption of Simulations and Measurements for Different Vendors}
|
\caption{Average Power Consumption of Simulations and Measurements for Different Vendors}
|
||||||
\label{fig:power_plot}
|
\label{fig:power_plot}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
As it can be seen, the $I_{DD}$ currents in the datasheet are overly pessimistic for all vendors:
|
As it can be seen, the $I_{DD}$ currents in the datasheet are overly pessimistic for all vendors:
|
||||||
The simulations based on the datasheets show on average a $4.8\times$ higher power consumption than the actual power measurements.
|
The simulations based on the datasheets show on average a $2.9\times$ higher power consumption than the actual power measurements.
|
||||||
However, when the measured currents are applied to the simulation, there is still a small discrepancy:
|
However, when the measured currents are applied to the simulation, the deviation drops to only around $18.8\%$.
|
||||||
This can be explained by the fact that the \todo{wrong:} measurement platform only measures the core power and not the interface power.
|
The largest deviation comes from the $I_{DD0*}$ current. It is unclear whether the measurement platform was able to fully saturate the memory controller's buffer and therefore report a lower average power consumption than in the simulation.
|
||||||
As DRAMPower also includes interface power estimates, it therefore reports a higher total power.
|
|
||||||
|
|
||||||
% LP4 vs LP5
|
% LP4 vs LP5
|
||||||
% DDR4 vs. DDR5
|
% DDR4 vs. DDR5
|
||||||
|
|||||||
@@ -240,3 +240,21 @@ Cg\_type: Outlook\\
|
|||||||
Subject\_term: Machine learning, Sustainability, Technology, Computer science, Engineering},
|
Subject\_term: Machine learning, Sustainability, Technology, Computer science, Engineering},
|
||||||
file = {/Users/myzinsky/Zotero/storage/2XJ6LXCA/d41586-024-03408-z.html}
|
file = {/Users/myzinsky/Zotero/storage/2XJ6LXCA/d41586-024-03408-z.html}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@InProceedings{stejun_20,
|
||||||
|
author="Steiner, Lukas
|
||||||
|
and Jung, Matthias
|
||||||
|
and Prado, Felipe S.
|
||||||
|
and Bykov, Kirill
|
||||||
|
and Wehn, Norbert",
|
||||||
|
editor="Orailoglu, Alex
|
||||||
|
and Jung, Matthias
|
||||||
|
and Reichenbach, Marc",
|
||||||
|
title="{DRAMSys4.0}: A Fast and Cycle-Accurate {SystemC/TLM}-Based {DRAM} Simulator",
|
||||||
|
booktitle="Embedded Computer Systems: Architectures, Modeling, and Simulation",
|
||||||
|
year="2020",
|
||||||
|
publisher="Springer International Publishing",
|
||||||
|
address="Cham",
|
||||||
|
pages="110--126",
|
||||||
|
isbn="978-3-030-60939-9"
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user