From 021a62a98582b9b12c99e05dea1cf89488465779 Mon Sep 17 00:00:00 2001
From: Lukas Steiner <lukas.steiner@rptu.de>
Date: Thu, 14 Nov 2024 17:56:20 +0000
Subject: [PATCH] Update on Overleaf.

---
 drampower-main.tex       | 142 ++++++++++++++++++++++-----------------
 img/refresh_currents.tex |   3 +-
 2 files changed, 82 insertions(+), 63 deletions(-)

diff --git a/drampower-main.tex b/drampower-main.tex
index b5b4ea1..42637f3 100644
--- a/drampower-main.tex
+++ b/drampower-main.tex
@@ -563,7 +563,7 @@ This relationship can be translated into the following equation to calculate $I_
 %\end{equation}
 %
 %
-\section{Interface Power Modeling}
+\section{Interface Power Modeling}\label{sec:interface_power_modeling}
 %
 Interface power refers to the power consumed by the drivers for the communication between memory controller and DRAM devices.
 In contrast to the core power, which is fixed for a specific device, the interface power depends on the complete DRAM subsystem architecture, i.e., the physical layer (PHY) of the memory controller, the channel architecture (number of ranks, possible usage of DIMMs, etc.) , the channel characteristics (e.g., channel loss and parasitic capacitances) and the DRAM PHYs.
@@ -573,7 +573,6 @@ Interface power can be divided into \textit{termination power}, which is dissipa
 In the following two sections, the calculation of termination power and dynamic power is explained.
 %
 \subsection{Termination Power}
-\todo{also counting commands as for core power}
 %
 %%%%
 %\begin{figure}
@@ -726,10 +725,9 @@ There are three commonly used termination schemes for DRAM, shown in Figure~\ref
         \label{fig:term_sstl}
     \end{subfigure}
 %
-    \caption[DRAM Interface Termination Schemes]{DRAM Interface Termination Schemes\footnotemark}
+    \caption{DRAM Interface Termination Schemes}
     \label{fig:term}
 \end{figure*}
-\footnotetext{The pull-up driver can be implemented with either PMOS or NMOS transistors.}
 %
 \textit{Pseudo open drain logic} (PODL) and \textit{low voltage swing terminated logic} (LVSTL) only use a pull-up or a pull-down resistor, respectively.
 In contrast, \textit{stub series terminated logic} (SSTL) uses both a pull-up and a pull-down resistor.
@@ -1102,61 +1100,28 @@ Finally, the switching activity $\alpha$ can be determined by counting the numbe
 %
 %
 %
-\section{Simulator Architecture}
+\section{Simulator Overview}
 %
-The new version of DRAMPower is not designed as a standalone simulator, but as a library that has to be coupled to a DRAM subsystem simulator, which models the DRAM controller and translates incoming read and write requests into DRAM commands. 
+This section provides a short introduction to the internal software architecture of DRAMPower.
+Afterwards, the simulation speed and simulation accuracy are evaluated.
+%
+\subsection{Simulator Architecture}
+\todo{multi-rank simulation, voltage domains!!!}
+%
+The new version of DRAMPower is not designed as a standalone simulator, but as a library that is coupled to a DRAM subsystem simulator which models the DRAM controller and translates incoming read and write requests into DRAM commands. 
 Alternatively, a DRAM command trace can be provided as an input file.
-For interface power calculation, the provided commands, addresses and data are translated into equivalent series of logic levels. 
+For the interface power calculation, the provided commands, addresses and data are translated into equivalent bit patterns using the command truth table of the simulated standard. 
 Based on this data, the number of transmitted zeros $n_0$, transmitted ones $n_1$ and zero to one transitions $n_{0 \rightarrow 1}$ can be calculated.
+To achieve high simulation speeds, bit manipulation instructions including the population count (\texttt{POPCNT}) instruction are used if available. 
+If no data is provided, a switching activity $\alpha$ and a ratio between both logic levels \todo{name} has to be provided.
+In addition to the command/address and data bus, the remaining signals like the clock signal pair, data strobe pairs or chip select need to be considered.
+As explained in Section~\ref{sec:interface_power_modeling}, the interface power calculation can become very complex and can depend on lots of parameters.
+\todo{In order to avoid handling all these cases within DRAMPower, the tool receives only receives the termination and dynamic power values for all signals as inputs.}
+These calculations need to be carried out externally using the provided equations.
 
-
-No standalone simulator, but coupled to e.g. DRAMSys
-\todo{ranks}
-\todo{count 1, 0 and 0->1 based on issued commands and data, alternatively use average values}
-\todo{count commands and clock cycles in each state for background power}
-
-The simulation kernel of DRAM Power uses a timestep based systems, to create a cycle accurate depiction of memory accesses. Different DRAM Standards are modeled as different classes inside the source code, to more accurately depict differences in DRAM behaviour. 
-
-
-The kernel takes as input a Memory Specification (MemSpec) file and a command list.  MemSpecs are a machine readable representation of a DRAM's spec sheet formatted as JSON. The command list contains traces of DRAM commands sent to the DRAM controller with corresponding timestamps, which will be processed during simulations. The command list can either be created manually or supplied in form of an input file, or from external tools directly, like simulation traces from DRAMSys.
-
-The simulation starts at timestamp t = 0 and iteratively processes each single command from the command list. Certain commands can issue following commands at a delayed cycle relative to their own execution. [braucht beispiel] Those deferred commands are referred as implicit commands inside DRAMPower and are inserted back into a command queue with a given timestamp. During every simulation step, the kernel checks if the command queue has pending implicit commands and executes them according to their timestamp.  
-
-
-DRAM Standards inside DRAMPower are programmatically modelled as classes. Since only a handful of behaviors are shared between DRAM Standards, each standard warrants its own implementation inside DRAMPower. Every implemented DRAM Standards inherits from a common base class, which handles all interaction with the kernel. The kernel dispatches commands to the instanced DRAM class, which then routes them through it's own function table, where commands are associated with implemented functions inside the DRAM class. 
-
-
-DRAMPower is also able to calculate interface power consumption. This is being achieved by simulating a bit accurate depiction of the command and data busses of a DRAM device. Each command of a given DRAM standard has a specified bit pattern, which is used by the controller to distinguish between commands. During execution the bits on the command bus constantly change, since the data bus is being overwritten with every incoming new command. This means, that the bits on the command bus can flip between cycles, thus leading to increases in power consumption. The same effect applies to the data bus as well, which is used to handle read and write commands.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-%
-\subsection{Simulation Kernel}
-%
-Windowing: Power can be evaluated during running simulation -> power over time is possible
-Handling implicit commands:
-Examples: Power Down Entry is not done when command is issued, but might be delayed
-RDA/WRA: auto-precharge is done after RD/WR is internally completed or only after tRAS is expired
-when command is issued, implicit command (lambda) is inserted into deque of implicit commands that is ordered by timestamp
-before we execute a new command or we request the window stats, we check if there are still outstanding requests in the implicit command queue with a timestamp smaller or equal to the current time 
-%
-DRAMPower does not use an event-driven simulation kernel, but it is only triggered externally when new commands are issued or when the total energy up to a certain point/the current time is requested.
-However, there is the case that a command that is issued at time $t$ only triggers an internal action/operation at time $t+x$.
-Thus, DRAMPower internally uses a queue that consists of a pair of a timestamp and a lambda expression.
-When a command is issued that triggers an action in the future, a lambda expression with the respective timestamp is inserted in the queue. 
-Whenever a new command is issued or the total energy is requested, it is first checked whether there are entries in the queue with a timestamp less or equal to the current timestamp.
-These lambdas are then first evaluated.
+The core power calculation is more complex because in addition to counting the number of issued commands of each type, DRAMPower needs to count the clock cycles that each DRAM device is in a specific state (i.e., 0 - B banks active, active/precharge power down, self refresh).
+This counting is made even more difficult by the fact that the internal state is not always changed immediately by an external command, but it can also change after a certain delay.
+An example for this behavior is shown in Figure~\ref{fig:implicit_commands}.
 %
 \begin{figure}
     \centering
@@ -1167,15 +1132,70 @@ These lambdas are then first evaluated.
     \label{fig:implicit_commands}
 \end{figure}
 %
-\subsection{Interface Power Calculation}
+When a read with auto precharge command (\texttt{RDA}) is issued, the target bank is automatically precharged after the read to precharge delay $t_{RTP}$ has expired.
+This means that the DRAM will internally issue what we call an \textit{implicit command} in the future.
+Unfortunately, DRAMPower is not based on an event-driven simulation kernel like SystemC where an event can be notified in the future.
+Instead, it is only triggered from the outside when issuing new commands, so the implicit commands need to be handled differently.
+The actions that are performed by one implicit command are formulated as a lambda expression, which is stored in an internal list ordered by the time stamp of execution.
+Whenever DRAMPower is triggered from the outside, first, the list is searched from the beginning for implicit commands with time stamps less than or equal to the current simulation time.
+The lambda expressions of these list entries are then evaluated first before the external command is handled. 
+The total power consumption can be calculated at any time and also during a simulation that is still running, which allows to analyze the power consumption over time.
+%
+%\subsection{Simulation Kernel}
 %%
-Physical equations from section ..., 
-power depends on command, address and data because the number of transmitted 0/1/toggles changes
-termination power -> number of transmitted 0 and 1, efficiently calculated using population count (POPCNT) command
+%Windowing: Power can be evaluated during running simulation -> power over time is possible
+%Handling implicit commands:
+%Examples: Power Down Entry is not done when command is issued, but might be delayed
+%RDA/WRA: auto-precharge is done after RD/WR is internally completed or only after tRAS is expired
+%when command is issued, implicit command (lambda) is inserted into deque of implicit commands that is ordered by timestamp
+%before we execute a new command or we request the window stats, we check if there are still outstanding requests in the implicit command queue with a timestamp smaller or equal to the current time 
+%%
+%DRAMPower does not use an event-driven simulation kernel, but it is only triggered externally when new commands are issued or when the total energy up to a certain point/the current time is requested.
+%However, there is the case that a command that is issued at time $t$ only triggers an internal action/operation at time $t+x$.
+%Thus, DRAMPower internally uses a queue that consists of a pair of a timestamp and a lambda expression.
+%When a command is issued that triggers an action in the future, a lambda expression with the respective timestamp is inserted in the queue. 
+%Whenever a new command is issued or the total energy is requested, it is first checked whether there are entries in the queue with a timestamp less or equal to the current timestamp.
+%These lambdas are then first evaluated.
+%%
+%\begin{figure}
+%    \centering
+%    \resizebox{\linewidth}{!}{%
+%    \input{img/implicit_commands}
+%    }
+%    \caption{Example for Implicit Command}
+%    \label{fig:implicit_commands}
+%\end{figure}
+%
+%Windowing: Power can be evaluated during running simulation -> power over time is possible
+%
+%No standalone simulator, but coupled to e.g. DRAMSys
+%\todo{ranks}
+%\todo{count 1, 0 and 0->1 based on issued commands and data, alternatively use average values}
+%\todo{count commands and clock cycles in each state for background power}
+%
+%The simulation kernel of DRAM Power uses a timestep based systems, to create a cycle accurate depiction of memory accesses. Different DRAM Standards are modeled as different classes inside the source code, to more accurately depict differences in DRAM behaviour. 
+%
+%
+%The kernel takes as input a Memory Specification (MemSpec) file and a command list.  MemSpecs are a machine readable representation of a DRAM's spec sheet formatted as JSON. The command list contains traces of DRAM commands sent to the DRAM controller with corresponding timestamps, which will be processed during simulations. The command list can either be created manually or supplied in form of an input file, or from external tools directly, like simulation traces from DRAMSys.
+%
+%The simulation starts at timestamp t = 0 and iteratively processes each single command from the command list. Certain commands can issue following commands at a delayed cycle relative to their own execution. [braucht beispiel] Those deferred commands are referred as implicit commands inside DRAMPower and are inserted back into a command queue with a given timestamp. During every simulation step, the kernel checks if the command queue has pending implicit commands and executes them according to their timestamp.  
+%
+%
+%DRAM Standards inside DRAMPower are programmatically modelled as classes. Since only a handful of behaviors are shared between DRAM Standards, each standard warrants its own implementation inside DRAMPower. Every implemented DRAM Standards inherits from a common base class, which handles all interaction with the kernel. The kernel dispatches commands to the instanced DRAM class, which then routes them through it's own function table, where commands are associated with implemented functions inside the DRAM class. 
+%
+%
+%DRAMPower is also able to calculate interface power consumption. This is being achieved by simulating a bit accurate depiction of the command and data busses of a DRAM device. Each command of a given DRAM standard has a specified bit pattern, which is used by the controller to distinguish between commands. During execution the bits on the command bus constantly change, since the data bus is being overwritten with every incoming new command. This means, that the bits on the command bus can flip between cycles, thus leading to increases in power consumption. The same effect applies to the data bus as well, which is used to handle read and write commands.
+%
+%%
+%%
+%\subsection{Interface Power Calculation}
+%%%
+%Physical equations from section ..., 
+%power depends on command, address and data because the number of transmitted 0/1/toggles changes
+%termination power -> number of transmitted 0 and 1, efficiently calculated using population count (POPCNT) command
 %
 \subsection{Simulation Speed}
 %
-
 \begin{figure}
     \centering
     \resizebox{\linewidth}{!}{%
@@ -1250,7 +1270,7 @@ The results are shown in Figure~\ref{fig:power_plot}.
 As it can be seen, the $I_{DD}$ currents in the datasheet are overly pessimistic for all vendors:
 The simulations based on the datasheets show on average a $4.8\times$ higher power consumption than the actual power measurements.
 However, when the measured currents are applied to the simulation, there is still a small discrepancy:
-This can be explained by the fact that the measurement platform only measures the core power and not the interface power.
+This can be explained by the fact that the \todo{wrong:} measurement platform only measures the core power and not the interface power.
 As DRAMPower also includes interface power estimates, it therefore reports a higher total power.
 
 % LP4 vs LP5
diff --git a/img/refresh_currents.tex b/img/refresh_currents.tex
index 043be87..09e5fc6 100644
--- a/img/refresh_currents.tex
+++ b/img/refresh_currents.tex
@@ -8,7 +8,6 @@
 \newcommand{\ya}{1}
 \newcommand{\yb}{2.6}
 \newcommand{\yc}{6}
-\newcommand{\yd}{7}
 
 \pgfdeclarelayer{background}
 \pgfsetlayers{background, main}
@@ -21,7 +20,7 @@
     ymin=0, ymax=7,
     xtick=\empty,
     ytick=\empty,
-    extra y ticks={\ya, \yb, \yc, \yd},
+    extra y ticks={\ya, \yb, \yc},
     extra y tick labels={$I_{DD2N}$, $I_{DD5A}$, $I_{DD5B}$},
     axis x line=middle,
     axis y line=middle,