Update on Overleaf.
This commit is contained in:
@@ -127,6 +127,7 @@
|
|||||||
|
|
||||||
\newcommand\todo[1]{\textcolor{red}{#1}}
|
\newcommand\todo[1]{\textcolor{red}{#1}}
|
||||||
\hyphenation{pre-charged}
|
\hyphenation{pre-charged}
|
||||||
|
\hyphenation{DRAMPower}
|
||||||
|
|
||||||
%\received{20 February 2007}
|
%\received{20 February 2007}
|
||||||
%\received[revised]{12 March 2009}
|
%\received[revised]{12 March 2009}
|
||||||
@@ -207,7 +208,15 @@
|
|||||||
%% The abstract is a short summary of the work to be presented in the
|
%% The abstract is a short summary of the work to be presented in the
|
||||||
%% article.
|
%% article.
|
||||||
\begin{abstract}
|
\begin{abstract}
|
||||||
As memory-intensive applications continue to drive demand for efficient, high-performance DRAM, accurately modeling DRAM power consumption has become critical for optimizing system design and meeting energy-efficiency requirements. DRAM power consumption is a significant contributor to overall system power, especially in data centers, mobile devices, and edge computing, where power budgets are often constrained. DRAMPower 5 addresses this need by offering an open-source, detailed power analysis tool that now supports the latest JEDEC standards, including DDR5 and LPDDR5. This latest version introduces a refined interface power model, capturing specific interface dynamics that are increasingly relevant for emerging DRAM technologies, enhancing the accuracy of power estimations. Furthermore, DRAMPower 5 is designed with a flexible, modular architecture, enabling straightforward extensibility to support future DRAM standards and custom configurations. These features make DRAMPower 5 an essential tool for researchers and engineers focused on precise, scalable power analysis for current and next-generation DRAM systems.
|
\todo{frequency vs. data rate}
|
||||||
|
As memory-intensive applications continue to drive demand for high DRAM bandwidth and capacity, accurately modeling the DRAM power consumption has become critical for optimizing system design and meeting power budgets.
|
||||||
|
Unfortunately, existing open-source DRAM power simulators only support older generations of DRAM standards, while current system designs mainly rely on the newest generation including DDR5, LPDDR5 or HBM3.
|
||||||
|
These standards support very high data rates beyond 10\,Gbps/pin
|
||||||
|
This paper presents DRAMPower 5, a revised version of the popular DRAMPower simulator,
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
%DRAMPower 5 addresses this need by offering an open-source, detailed power analysis tool that now supports the latest JEDEC standards, including DDR5 and LPDDR5. This latest version introduces a refined interface power model, capturing specific interface dynamics that are increasingly relevant for emerging DRAM technologies, enhancing the accuracy of power estimations. Furthermore, DRAMPower 5 is designed with a flexible, modular architecture, enabling straightforward extensibility to support future DRAM standards and custom configurations. These features make DRAMPower 5 an essential tool for researchers and engineers focused on precise, scalable power analysis for current and next-generation DRAM systems.
|
||||||
\end{abstract}
|
\end{abstract}
|
||||||
%\paperlogo
|
%\paperlogo
|
||||||
\maketitle
|
\maketitle
|
||||||
@@ -222,9 +231,7 @@ As memory-intensive applications continue to drive demand for efficient, high-pe
|
|||||||
%DRAM Background: Short Intro of DRAM Interface and Core, single ended bidirection DQ, differential data strobe, data sampled when DQS\_t and DQS\_c cross -> double data rate
|
%DRAM Background: Short Intro of DRAM Interface and Core, single ended bidirection DQ, differential data strobe, data sampled when DQS\_t and DQS\_c cross -> double data rate
|
||||||
%
|
%
|
||||||
\section{Introduction}
|
\section{Introduction}
|
||||||
%Keywords: data centric applications, huge DRAM requirements, study found out that DRAM contributes ... to total power consumption
|
|
||||||
%
|
%
|
||||||
Siehe Abstract!!
|
|
||||||
The recent expansion of memory-intensive applications has led to increased demand for DRAM bandwidth and capacity in current computing systems.
|
The recent expansion of memory-intensive applications has led to increased demand for DRAM bandwidth and capacity in current computing systems.
|
||||||
This demand is particularly pronounced in \textit{Artificial Intelligence} (AI) applications, where specialized accelerator chips with immense DRAM bandwidths beyond 1\,TBps are used.
|
This demand is particularly pronounced in \textit{Artificial Intelligence} (AI) applications, where specialized accelerator chips with immense DRAM bandwidths beyond 1\,TBps are used.
|
||||||
However, these bandwidths come at the cost of high power consumption.
|
However, these bandwidths come at the cost of high power consumption.
|
||||||
@@ -236,7 +243,7 @@ Nevertheless, it is equally important to accurately estimate DRAM power consumpt
|
|||||||
In the current state of the art, there are two widely used open-source simulation tools for estimating DRAM power consumption, namely \textit{DRAMPower}~\cite{kargoo_14} and \textit{CACTI-IO}~\cite{joukah_12,joukah_15}.
|
In the current state of the art, there are two widely used open-source simulation tools for estimating DRAM power consumption, namely \textit{DRAMPower}~\cite{kargoo_14} and \textit{CACTI-IO}~\cite{joukah_12,joukah_15}.
|
||||||
DRAMPower focuses on the power consumption of the DRAM core, while CACTI-IO models the power consumption of the DRAM interface.
|
DRAMPower focuses on the power consumption of the DRAM core, while CACTI-IO models the power consumption of the DRAM interface.
|
||||||
Unfortunately, both tools have not been updated in recent years, so they only provide support for older DRAM standards.
|
Unfortunately, both tools have not been updated in recent years, so they only provide support for older DRAM standards.
|
||||||
At the same time, current generation DRAM standards like DDR5, LPDDR5 and HBM3 operate at much higher frequencies than their predecessors, use novel interconnection techniques, and offer many new features, which requires special consideration for power modeling.
|
At the same time, current generation DRAM standards like DDR5, LPDDR5 and HBM3 operate at much higher data rates than their predecessors, use novel interconnection techniques, and offer many new features, which requires special consideration for power modeling.
|
||||||
To the best of our knowledge, there is no open-source DRAM power simulator that provides accurate models of both the DRAM core and interface, and supports current generation DRAM standards such as DDR5, LPDDR5 and HBM3.
|
To the best of our knowledge, there is no open-source DRAM power simulator that provides accurate models of both the DRAM core and interface, and supports current generation DRAM standards such as DDR5, LPDDR5 and HBM3.
|
||||||
To fill this gap, this paper presents DRAMPower 5, a completely revised version of the DRAMPower simulator, with a greatly enhanced feature set including both core and interface power modeling, an efficient simulation kernel, \todo{accuracy?} and support for the latest DRAM standards.
|
To fill this gap, this paper presents DRAMPower 5, a completely revised version of the DRAMPower simulator, with a greatly enhanced feature set including both core and interface power modeling, an efficient simulation kernel, \todo{accuracy?} and support for the latest DRAM standards.
|
||||||
|
|
||||||
@@ -247,6 +254,7 @@ This paper makes the following new contributions:
|
|||||||
\item We show that at high operating frequencies, the approximations commonly used for interface power modeling result in large errors and a different modeling approach is required.
|
\item We show that at high operating frequencies, the approximations commonly used for interface power modeling result in large errors and a different modeling approach is required.
|
||||||
\item We present a new simulator architecture that can be easily extended by new standards or features and achieves high simulation speeds.
|
\item We present a new simulator architecture that can be easily extended by new standards or features and achieves high simulation speeds.
|
||||||
\item \todo{Accuracy simulations}
|
\item \todo{Accuracy simulations}
|
||||||
|
\item \todo{supported standards!!!}
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
The rest of the paper is structured as follows \todo{...}
|
The rest of the paper is structured as follows \todo{...}
|
||||||
@@ -399,9 +407,9 @@ If a more accurate modeling is required, the calculations can still be refined w
|
|||||||
%%%%
|
%%%%
|
||||||
The following section provides an overview of these currents.
|
The following section provides an overview of these currents.
|
||||||
%
|
%
|
||||||
\subsection{Current Measurement Conditions}
|
\subsection{Current Measurement Conditions}\label{subsec:current_measurement}
|
||||||
%
|
%
|
||||||
The minimum set specified in all JEDEC standards includes the following nine currents:
|
The minimum set specified in all DRAM standards includes the following nine currents:
|
||||||
%
|
%
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item $I_{DD0}$ (Operating one bank active-precharge current): Activate and precharge commands are sent alternately with minimum spacing. The target bank is toggled with each activate command.
|
\item $I_{DD0}$ (Operating one bank active-precharge current): Activate and precharge commands are sent alternately with minimum spacing. The target bank is toggled with each activate command.
|
||||||
@@ -415,7 +423,7 @@ The minimum set specified in all JEDEC standards includes the following nine cur
|
|||||||
\item $I_{DD6}$ (Self refresh current): The device is in self-refresh operation and the external clock is turned off.
|
\item $I_{DD6}$ (Self refresh current): The device is in self-refresh operation and the external clock is turned off.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
%
|
%
|
||||||
Unfortunately, JEDEC is very inconsistent in specifying the currents.
|
Unfortunately, the different JEDEC subcommittees, which are responsible for formulating DRAM standards, are very inconsistent in specifying the currents.
|
||||||
Apart from different naming schemes, the measurement conditions mentioned above only apply for standards of the DDR family, while they differ for LPDDR, GDDR and HBM.
|
Apart from different naming schemes, the measurement conditions mentioned above only apply for standards of the DDR family, while they differ for LPDDR, GDDR and HBM.
|
||||||
For example, LPDDR measures $I_{DD3N}$, $I_{DD3P}$, $I_{DD4R}$ and $I_{DD4W}$ with only one bank active.
|
For example, LPDDR measures $I_{DD3N}$, $I_{DD3P}$, $I_{DD4R}$ and $I_{DD4W}$ with only one bank active.
|
||||||
GDDR measures $I_{DD3N}$ and $I_{DD3P}$ with one bank active, while $I_{DD4R}$ and $I_{DD4W}$ are measured with one bank in each bank group active.
|
GDDR measures $I_{DD3N}$ and $I_{DD3P}$ with one bank active, while $I_{DD4R}$ and $I_{DD4W}$ are measured with one bank in each bank group active.
|
||||||
@@ -512,6 +520,8 @@ Thus, the equations need to be adapted accordingly, i.e., for GDDR, $I_{DD3N}$ m
|
|||||||
%
|
%
|
||||||
\subsection{Refresh Power}\label{subsec:refresh}
|
\subsection{Refresh Power}\label{subsec:refresh}
|
||||||
%
|
%
|
||||||
|
As explained in Section~\ref{subsec:current_measurement}, JEDEC
|
||||||
|
%
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
\resizebox{\linewidth}{!}{%
|
\resizebox{\linewidth}{!}{%
|
||||||
@@ -545,19 +555,18 @@ Same-bank refresh for device with \textit{BG} bank groups and \textit{BA} banks
|
|||||||
%
|
%
|
||||||
\section{Interface Power Modeling}
|
\section{Interface Power Modeling}
|
||||||
%
|
%
|
||||||
Interface power refers to the power consumed/dissipated by the input/output (I/O) circuitry that connects the memory controller and DRAM devices.
|
|
||||||
In contrast to the core power, which is fixed for a specific device, the interface power depends on the complete DRAM subsystem architecture, i.e., the physical layer (PHY) of the memory controller, the channel architecture (ranks...), interconnect type...
|
|
||||||
|
|
||||||
\todo{cite \cite{dalpou_98} \cite{bak_90}}
|
\todo{cite \cite{dalpou_98} \cite{bak_90}}
|
||||||
%
|
%
|
||||||
|
Interface power refers to the power consumed by the drivers for the communication between memory controller and DRAM devices.
|
||||||
|
In contrast to the core power, which is fixed for a specific device, the interface power depends on the complete DRAM subsystem architecture, i.e., the physical layer (PHY) of the memory controller, the channel architecture (e.g., number of ranks) , the channel characteristics (e.g., channel loss and parasitic capacitances) and the DRAM PHYs.
|
||||||
It can be divided into two parts:
|
It can be divided into two parts:
|
||||||
%
|
%
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item \textit{Termination power} is dissipated through the termination resistances required for signal integrity. \todo{Depending on the termination scheme, power is dissipated at both logic levels or only one logic level.}
|
\item \textit{Termination power} is dissipated across the termination resistances required for signal integrity.
|
||||||
\item \textit{Dynamic power} is dissipated through the lossy charging and discharging of parasitic capacitances \todo{and transmission lines}. It only appears when the signal toggles.
|
\item \textit{Dynamic power} is dissipated through the lossy charging and discharging of parasitic capacitances and the signaling over a lossy transmission line.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
%
|
%
|
||||||
In the following, the calculation of termination power and dynamic power is explained.
|
In the following two sections, the calculation of termination power and dynamic power is explained.
|
||||||
%
|
%
|
||||||
\subsection{Termination Power}
|
\subsection{Termination Power}
|
||||||
\todo{also counting commands as for core power}
|
\todo{also counting commands as for core power}
|
||||||
@@ -630,7 +639,7 @@ In the following, the calculation of termination power and dynamic power is expl
|
|||||||
% \label{fig:term_push_pull}
|
% \label{fig:term_push_pull}
|
||||||
%\end{figure}
|
%\end{figure}
|
||||||
%
|
%
|
||||||
The termination power depends on the termination scheme, the \todo{signal/bus on time}, and the ratio between the two logic levels, but it is independent of the operating frequency.
|
The termination power depends on the termination scheme, the ratio between the two logic levels, and the time at which the termination is enabled, but it is independent of the operating frequency.
|
||||||
There are three commonly used termination schemes for DRAM, shown in Figure~\ref{fig:term} for a simple point-to-point connection.
|
There are three commonly used termination schemes for DRAM, shown in Figure~\ref{fig:term} for a simple point-to-point connection.
|
||||||
%
|
%
|
||||||
\begin{figure*}
|
\begin{figure*}
|
||||||
@@ -766,14 +775,14 @@ The dissipated power is calculated as
|
|||||||
\begin{equation}
|
\begin{equation}
|
||||||
P_{term,0}^{PODL} = \frac{V_{DDQ}^2}{R_{ON} + R_{TT}}.
|
P_{term,0}^{PODL} = \frac{V_{DDQ}^2}{R_{ON} + R_{TT}}.
|
||||||
\end{equation}
|
\end{equation}
|
||||||
In the case of an LVSTL interface, the equations are reversed, i.e.,
|
In the case of an LVSTL interface, the equations for both logic levels are reversed. %, i.e.,
|
||||||
\begin{equation}
|
%\begin{equation}
|
||||||
P_{term,0}^{LVSTL} = 0
|
%P_{term,0}^{LVSTL} = 0
|
||||||
\end{equation}
|
%\end{equation}
|
||||||
and
|
%and
|
||||||
\begin{equation}
|
%\begin{equation}
|
||||||
P_{term,1}^{LVSTL} = \frac{V_{DDQ}^2}{R_{ON} + R_{TT}}.
|
%P_{term,1}^{LVSTL} = \frac{V_{DDQ}^2}{R_{ON} + R_{TT}}.
|
||||||
\end{equation}
|
%\end{equation}
|
||||||
The SSTL interface uses both a pull-up and a pull-down resistor, therefore, power is dissipated at both logic levels.
|
The SSTL interface uses both a pull-up and a pull-down resistor, therefore, power is dissipated at both logic levels.
|
||||||
It can be calculated as
|
It can be calculated as
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
@@ -911,11 +920,11 @@ The dynamic power $P_{dyn}$, which adds to the termination power due to the togg
|
|||||||
\begin{equation}
|
\begin{equation}
|
||||||
P_{dyn} = P_{total} - P_{term}.
|
P_{dyn} = P_{total} - P_{term}.
|
||||||
\end{equation}
|
\end{equation}
|
||||||
One alternative formula, which is often used to calculate the dynamic power~\cite{joukah_12,joukah_15}, is
|
One alternative formula, which is often used to calculate the dynamic power $P_{dyn}$, is given by
|
||||||
\begin{equation}\label{eq:approx}
|
\begin{equation}\label{eq:approx}
|
||||||
P_{dyn} = \left(\sum_i C_i V_{sw,i}\right) \frac{V_{DDQ} \cdot f}{2}
|
P_{dyn} = \left(\sum_i C_i V_{sw,i}\right) \frac{V_{DDQ} \cdot f}{2}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
where $C_i$ are the capacitances along the channel and $V_{sw,i}$ the respective voltage swings at each capacitance.
|
where $C_i$ are the capacitances along the channel and $V_{sw,i}$ the respective voltage swings at each capacitance~\cite{bak_90,joukah_12,joukah_15}.
|
||||||
The voltage swings are usually determined using a DC analysis for both logic levels.
|
The voltage swings are usually determined using a DC analysis for both logic levels.
|
||||||
While this approximation provides accurate results at low operating frequencies, current generation DRAM interfaces do not reach full swing anymore due to the large parasitic capacitances in combination with high operating frequencies.
|
While this approximation provides accurate results at low operating frequencies, current generation DRAM interfaces do not reach full swing anymore due to the large parasitic capacitances in combination with high operating frequencies.
|
||||||
Figure~\ref{fig:power_comp} shows the total power dissipation at different operating frequencies calculated with SPICE, Equation~\ref{eq:fourier} and Equation~\ref{eq:approx}.
|
Figure~\ref{fig:power_comp} shows the total power dissipation at different operating frequencies calculated with SPICE, Equation~\ref{eq:fourier} and Equation~\ref{eq:approx}.
|
||||||
|
|||||||
Reference in New Issue
Block a user