Update on Overleaf.
This commit is contained in:
@@ -22,7 +22,7 @@
|
||||
%% for your publication.
|
||||
%%
|
||||
%%
|
||||
\documentclass[sigconf, anonymous, review]{acmart}
|
||||
\documentclass[sigconf, anonymous, review, nonacm=true]{acmart}
|
||||
%\documentclass[sigconf]{acmart}
|
||||
|
||||
%%
|
||||
@@ -235,8 +235,8 @@ This paper presents DRAMPower 5, a revised version of the popular DRAMPower simu
|
||||
The recent expansion of memory-intensive applications has led to increased demand for DRAM bandwidth and capacity in current computing systems.
|
||||
This demand is particularly pronounced in \textit{Artificial Intelligence} (AI) applications, where specialized accelerator chips with immense DRAM bandwidths beyond 1\,TBps are used.
|
||||
However, these bandwidths come at the cost of high power consumption.
|
||||
Google recently demonstrated that for large machine learning models, more than 90\,\% of the system power is consumed by memory.
|
||||
In augmented reality devices for the Metaverse, memory can account for up to 80\,\% of power consumption.\todo{quellen} \cite{yankao_24}
|
||||
In datacenters very often around 90\,\% of the system power is consumed by memory~\cite{bou_24}.
|
||||
Even in embedded augmented reality devices for the Metaverse, memory can account for more than 40\,\% of power consumption \cite{yankao_24}.
|
||||
Therefore, an accurate estimation of DRAM power consumption is critical in the early stages of design in order to properly dimension the power supply circuits and cooling.
|
||||
In mobile devices, on the other hand, the overall power budget is constrained to only a few watts.
|
||||
Nevertheless, it is equally important to accurately estimate DRAM power consumption, for example to explore the power saving potential of new DRAM standards and their additional features to extend battery life.\cite{borgho_18}
|
||||
@@ -390,12 +390,6 @@ As the internal architecture of modern DRAM devices is very complex and highly p
|
||||
However, each DRAM standard defines a set of currents for fixed operating scenarios, which are listed in vendor datasheets.
|
||||
Based on these currents, the core power can be estimated.
|
||||
%%%%
|
||||
\todo{Place in other position}
|
||||
This approach still faces two problems, which have been highlighted by Ghose et al. in~\cite{ghoyag_18}.
|
||||
First, the device-to-device variations are very large, which forces the vendors to be very pessimistic when specifying the operating currents.
|
||||
As a consequence, power is overestimated in most cases.
|
||||
Second, the currents are measured for fixed data and address patterns, i.e., no data dependencies and structural variations within the device are considered.
|
||||
If a more accurate modeling is required, the calculations can still be refined with additional device measurements.
|
||||
%%%%
|
||||
The following section provides an overview of these currents.
|
||||
%
|
||||
@@ -425,8 +419,13 @@ Similarly, the refresh currents are also measured under various conditions.
|
||||
While DDR standards specify a burst refresh current $I_{DD5B}$ for all available refresh modes, LPDDR standards specify a burst refresh current only for all-bank refresh, while for per-bank refresh, an average current $I_{DD5A}$ is provided.
|
||||
The difference between $I_{DD5B}$ and $I_{DD5A}$ is the spacing between two consecutive refresh commands.
|
||||
It is the refresh cycle time $t_{RFC}$ (i.e., the duration of a single refresh operation) for $I_{DD5B}$ and the much longer average refresh interval $t_{REFI}$ (i.e., the interval at which refresh commands need to be issued in normal operation) for $I_{DD5A}$.
|
||||
GDDR5/5X/6 and HBM1/2 do not specify a current for per-bank refresh at all although they support it, while HBM3 specifies a burst refresh current both for all-bank and per-bank refresh.
|
||||
%GDDR5/5X/6 and HBM1/2 do not specify a current for per-bank refresh at all although they support it.
|
||||
Section~\ref{subsec:refresh} shows how refresh power can be modeled using the provided currents of each standard.
|
||||
Even if all missing currents are calculated, the used approach for core power calculation still faces two problems, which have been highlighted in~\cite{ghoyag_18}.
|
||||
First, the device-to-device variations are very large, which forces the vendors to be very pessimistic when specifying operating currents.
|
||||
As a consequence, power is overestimated in most cases.
|
||||
Second, the currents are measured for fixed data and address patterns, i.e., no data dependencies and structural variations within the device are considered.
|
||||
If a more accurate modeling is required, the calculations have to be refined with additional device measurements.
|
||||
\todo{last subsection? extra features, maybe future work?}
|
||||
\todo{multiple supply voltages!}
|
||||
|
||||
@@ -545,7 +544,7 @@ This relationship can be translated into the following equation to calculate $I_
|
||||
\begin{equation}
|
||||
I_{DD5B} = I_{DD2N} + \left(I_{DD5A} - I_{DD2N}\right) \cdot \frac{t_{REFI}}{t_{RFC}}
|
||||
\end{equation}
|
||||
|
||||
%
|
||||
%The equation can be used to calculate the burst refresh current of different refresh modes by substituting the average refresh current $I_{DD5A}$, refresh interval $t_{REFI}$ and refresh cycle time $t_{RFC}$ with the appropriate values.
|
||||
%During refresh, the device is considered in active state because internally the banks are constantly activated and refreshed.
|
||||
%The energy for an all-bank refresh command can be calculated as
|
||||
@@ -566,18 +565,11 @@ This relationship can be translated into the following equation to calculate $I_
|
||||
%
|
||||
\section{Interface Power Modeling}
|
||||
%
|
||||
\todo{cite \cite{dalpou_98} \cite{bak_90}}
|
||||
%
|
||||
Interface power refers to the power consumed by the drivers for the communication between memory controller and DRAM devices.
|
||||
In contrast to the core power, which is fixed for a specific device, the interface power depends on the complete DRAM subsystem architecture, i.e., the physical layer (PHY) of the memory controller, the channel architecture (e.g., number of ranks) , the channel characteristics (e.g., channel loss and parasitic capacitances) and the DRAM PHYs.
|
||||
\todo{modeling based on currents not possible, moreover, currents measured with ODT disabled}
|
||||
It can be divided into two parts:
|
||||
%
|
||||
\begin{itemize}
|
||||
\item \textit{Termination power} is dissipated across the termination resistances required for signal integrity.
|
||||
\item \textit{Dynamic power} is dissipated through the lossy charging and discharging of parasitic capacitances and the signaling over a lossy transmission line.
|
||||
\end{itemize}
|
||||
%
|
||||
In contrast to the core power, which is fixed for a specific device, the interface power depends on the complete DRAM subsystem architecture, i.e., the physical layer (PHY) of the memory controller, the channel architecture (number of ranks, possible usage of DIMMs, etc.) , the channel characteristics (e.g., channel loss and parasitic capacitances) and the DRAM PHYs.
|
||||
Thus, a modeling based on the operating currents specified in vendor datasheets is not possible as they are only measured for one specific subsystem architecture.
|
||||
Instead, we calculate the interface power based on an equivalent circuit diagram of the real interface architecture as is also done by CACTI-IO.
|
||||
Interface power can be divided into \textit{termination power}, which is dissipated across the termination resistances required for signal integrity, and \textit{dynamic power}, which is dissipated through the lossy charging and discharging of parasitic capacitances and the signaling over a lossy transmission line.
|
||||
In the following two sections, the calculation of termination power and dynamic power is explained.
|
||||
%
|
||||
\subsection{Termination Power}
|
||||
@@ -740,8 +732,8 @@ There are three commonly used termination schemes for DRAM, shown in Figure~\ref
|
||||
\footnotetext{The pull-up driver can be implemented with either PMOS or NMOS transistors.}
|
||||
%
|
||||
\textit{Pseudo open drain logic} (PODL) and \textit{low voltage swing terminated logic} (LVSTL) only use a pull-up or a pull-down resistor, respectively.
|
||||
In contrast, \textit{stub series terminated logic} (SSTL) uses both a pull-up and a pull-down resistor. \todo{so the termination voltage level $V_{TT}$ is at $V_{DDQ}/2$.}
|
||||
In all three cases, the termination resistance is matched the characteristic impedance of the transmission line, i.e., $R_{TT} \approx Z_0$ (remember that in AC analysis a DC voltage source is treated as a short).\todo{lossy transmission line}
|
||||
In contrast, \textit{stub series terminated logic} (SSTL) uses both a pull-up and a pull-down resistor.
|
||||
In all three cases, the termination resistance is matched the characteristic impedance of the transmission line, i.e., $R_{TT} \approx Z_0$ (remember that in AC analysis a DC voltage source is treated as a short).
|
||||
To calculate the power, both logic levels are considered separately.
|
||||
The transistor of the driver that is switched on is replaced with an equivalent resistor with resistance $R_{ON}$, while the transistor that is switched off is replaced with an open line.
|
||||
As an example, Figure~\ref{fig:terminations} shows the two equivalent circuit diagrams for a PODL interface.
|
||||
@@ -835,8 +827,8 @@ The average termination power for transmitting $n_0$ logic zeros and $n_1$ logic
|
||||
%\begin{equation}
|
||||
% E_{term} = (P_{term,0} \cdot n_0 + P_{term,1} \cdot n_1) \cdot t_b.
|
||||
%\end{equation}
|
||||
Because with PODL and LVSTL only one logic level consumes power, a data bus inversion mechanism can be implemented to reduce the power consumption.
|
||||
With SSTL the termination power is independent of the transmitted data.
|
||||
Because with PODL and LVSTL only one logic level consumes power, data bus inversion can be used to reduce the termination power consumption.
|
||||
With SSTL, the termination power is independent of the transmitted data.
|
||||
|
||||
When using channel configurations with multiple ranks or DIMMs, the interconnect network can change from a simple point-to-point topology to a more complex topology, e.g., because the non-target dies also terminate the bus.
|
||||
In these cases, termination power can be calculated in the same way by determining the equivalent circuit diagrams for both logic levels.
|
||||
@@ -899,8 +891,8 @@ Figure~\ref{fig:load_caps} shows the simple point-to-point connection with PODL
|
||||
\end{figure}
|
||||
%
|
||||
We analyze the power dissipation of this circuit for different operating frequencies as input using SPICE.
|
||||
The components are dimensioned as $R_{ON}$ = $R_{TT}$ = \SI{50}{\ohm}, $C_{TX}$ = $C_{RX}$ = \SI{1}{\pico\farad} and $V_{DDQ}$ = \SI{1}{\volt}, which is in the order of real DRAM interfaces.
|
||||
For now, the transmission line is modeled as a short.
|
||||
The components are dimensioned as $R_{ON}$ = \SI{40}{\ohm}, $R_{TT}$ = \SI{60}{\ohm}, $C_{TX}$ = $C_{RX}$ = \SI{1}{\pico\farad} and $V_{DDQ}$ = \SI{1.1}{\volt}, which is in the order of a real DDR5 interface.
|
||||
For now, the transmission line is also only modeled as a parasitic capacitance with $C_{TL}$ = \SI{2}{\pico\farad}.
|
||||
%
|
||||
%\begin{figure}
|
||||
% \centering
|
||||
@@ -921,9 +913,9 @@ For now, the transmission line is modeled as a short.
|
||||
% \label{fig:enter-label}
|
||||
%\end{figure}
|
||||
%
|
||||
At frequencies below \SI{100}{\mega\hertz}, the dissipated power is approximately \SI{5}{\milli\watt}, which corresponds to the termination power of the circuit.
|
||||
At a frequency of \SI{100}{\mega\hertz}, the dissipated power is \SI{6.2}{\milli\watt}, which is close to the termination power of the circuit of \SI{6.1}{\milli\watt}.
|
||||
However with increasing frequencies, the power also increases because the capacitors start to conduct.
|
||||
At \SI{3200}{\mega\hertz} (i.e., 6.4\,Gbps/pin at DDR), the dissipated power is already \SI{6.5}{\milli\watt}, i.e., \SI{30}{\percent} higher than the pure termination power.
|
||||
At \SI{1600}{\mega\hertz} (i.e., 3.2\,Gbps/pin at DDR), the dissipated power is already \SI{8.6}{\milli\watt}, i.e., \SI{40}{\percent} higher than the pure termination power.
|
||||
To calculate the power dissipation analytically, the clock signal with frequency $f$ and voltage swing $V_{DDQ}$ can be expressed as a Fourier series
|
||||
\begin{equation}
|
||||
v(t) = \frac{V_{DDQ}}{2} + \Re \left\{\frac{-2j \cdot V_{DDQ}}{\pi} \sum_{k=1,3,5,\dots}^{\infty} \frac{1}{k} \exp(j 2 \pi f k t)\right\}.
|
||||
@@ -981,10 +973,10 @@ Figure~\ref{fig:power_comp} shows the total power dissipation at different opera
|
||||
bar width=2mm,
|
||||
legend pos=north west
|
||||
]
|
||||
\addplot+ coordinates {(100,5.05) (200,5.1) (400,5.2) (800,5.4) (1600,5.8) (3200,6.47) (6400,7.09)};
|
||||
\addplot+ coordinates {(100,5.05) (200,5.1) (400,5.2) (800,5.4) (1600,5.8) (3200,6.47) (6400,7.09)};
|
||||
\addplot+ coordinates {(100,5.05) (200,5.1) (400,5.2) (800,5.4) (1600,5.8) (3200,6.6) (6400,8.2)};
|
||||
\legend{SPICE, Fourier Series, Approximation}
|
||||
\addplot+ coordinates {(100,6.2) (200,5.1) (400,5.2) (800,5.4) (1600,5.8) (3200,6.47) (6400,7.09)};
|
||||
\addplot+ coordinates {(100,6.2) (200,5.1) (400,5.2) (800,5.4) (1600,5.8) (3200,6.47) (6400,7.09)};
|
||||
\addplot+ coordinates {(100,6.2) (200,5.1) (400,5.2) (800,5.4) (1600,5.8) (3200,6.6) (6400,8.2)};
|
||||
\legend{SPICE, Fourier Series (This Work), Approximation (CACTI-IO)}
|
||||
\end{axis}
|
||||
\end{tikzpicture}
|
||||
\caption{Comparison of Different Calculation Methods for Power Dissipation}
|
||||
|
||||
@@ -223,3 +223,22 @@ series = {ASPLOS '18}
|
||||
langid = {english},
|
||||
file = {/Users/myzinsky/Zotero/storage/22TRQV4G/Yang et al. - Characterization and Design of 3D-Stacked Memory for Image Signal Processing on ARVR Devices.pdf}
|
||||
}
|
||||
|
||||
@article{bou_24,
|
||||
title = {Fixing {{AI}}'s Energy Crisis},
|
||||
author = {Bourzac, Katherine},
|
||||
year = {2024},
|
||||
month = oct,
|
||||
journal = {Nature},
|
||||
publisher = {Nature Publishing Group},
|
||||
doi = {10.1038/d41586-024-03408-z},
|
||||
urldate = {2024-11-14},
|
||||
abstract = {Hardware that consumes less power will reduce artificial intelligence's appetite for energy. But transparency about its carbon footprint is still needed.},
|
||||
copyright = {2024 Springer Nature Limited},
|
||||
langid = {english},
|
||||
keywords = {Computer science,Engineering,Machine learning,Sustainability,Technology},
|
||||
annotation = {Bandiera\_abtest: a\\
|
||||
Cg\_type: Outlook\\
|
||||
Subject\_term: Machine learning, Sustainability, Technology, Computer science, Engineering},
|
||||
file = {/Users/myzinsky/Zotero/storage/2XJ6LXCA/d41586-024-03408-z.html}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user