DbiPlayer progress

This commit is contained in:
2022-05-11 22:03:45 +02:00
parent c11f09ebe2
commit 0584e7fe6b
7 changed files with 179 additions and 89 deletions

View File

@@ -6,9 +6,10 @@ At first, the DynamoRIO analyzer tool that produces the memory access traces and
Furthermore, the trace player for DRAMSys will acquire special focus as well as the mandatory cache model that is used to model the cache-filtering in a real system.
The last part will concentrate on the special architecture of new trace player and challenges the internal interconnection solves.
\subsection{Analysis tool}
\subsection{Analysis Tool}
\label{sec:analysis_tool}
As described in section TODO the dynamic binary instrumentation tool DynamoRIO will be used to trace the memory accesses while the target application is running.
As described in section \ref{sec:dynamorio} the dynamic binary instrumentation tool DynamoRIO will be used to trace the memory accesses while the target application is running.
Instead of writing a DynamoRIO client from the ground up, the DrCacheSim framework is used.
DrCacheSim is a DynamoRIO client that gathers memory and instruction access traces and forwards them to an analyzer tool.
@@ -30,21 +31,23 @@ In case of the online tracing, DrCacheSim consists of two seperate processes:
The analyzer-side can contain many analysis tools that operate on those stream of records.
\end{itemize}
The \abbr{inter-process communication}{IPC} between the two parts is achieved through a \textit{named\ pipe}.
The \revabbr{inter-process communication}{IPC} between the two parts is achieved through a \textit{named\ pipe}.
Figure \ref{fig:drcachesim} illustrates the structure of the individual parts.
\begin{figure}
\input{img/thesis.tikzstyles}
\begin{figure}
\begin{center}
\tikzfig{img/drcachesim}
\caption{Structure of the DrCacheSim online tracing.}
\label{fig:drcachesim}
\end{center}
\end{figure}
A \texttt{memref\_t} can either represent an instruction, a data reference or a metadata event such as a timestamp or a CPU identifier.
Besides of the type, the \abbr{process identifier}{PID} and \abbr{thread identifier}{TID} is included in every record to be able to associate them.
Besides of the type, the \revabbr{process identifier}{PID} and \revabbr{thread identifier}{TID} is included in every record to be able to associate them.
For an instruction marker, the size of the instruction as well as the virtual address of the instruction in the memory map is provided.
DrCacheSim stores the current mapping of all binary executables and shared libraries in a seperate file, so that it is possible to decode named instructions even after the application has exited.
For data references, the address and size of the desired access is provided as well the \abbr{program counter}{PC} from which it was initiated.
For data references, the address and size of the desired access is provided as well the \revabbr{program counter}{PC} from which it was initiated.
Analysis tools implement the \texttt{analysis\_tool\_t} interface as this enables the analyzer to forward a received record to multiple tools in a polymorphic manner.
In particular, the \texttt{process\_memref\_t()} method of a tool is called for incoming every record.
@@ -54,10 +57,77 @@ As it is not known how many threads an application will spawn, the tool will lis
For every data reference, a new entry in the corresponding trace file is made which contains the size and the address of the access, whether it was a read or write, and also a count of (computational) instructions that have been executed since the last reference.
This instruction count is used to approximate the delay between the memory accesses when the trace is replayed by DRAMSys as described in section TODO.
\begin{listing}
\begin{textcode}
# instruction count,read/write,data size,data address
# <timestamp>
<13295366593324052>
4,r,8,1774ef30
0,r,8,1774ef38
1,w,8,1774ef28
2,w,8,1774ee88
0,r,8,17744728
1,r,8,238c3fb0
\end{textcode}
\caption{Example of a memory access trace with a timestamp.}
\label{list:memtrace}
\end{listing}
As of writing this thesis, there is no application binary interface for analysis tools defined in the DrCacheSim-Framework.
Therefore it is not possible to load the DRAMTracer tool as a shared library but rather it is required to modify the DynamoRIO source code to integrate the tool.
\subsection{DbiPlayer architecture}
\subsection{DbiPlayer Architecture}
\label{sec:dbiplayer_architecture}
This section covers the general architecture of the DbiPlayer, the new trace player for DRAMSys that replays the captured trace files.
For every recorded thread, a new so-called DbiThreadPlayer is spawned, which is a standalone initiator for transactions.
Because those threads need to be synchronized to approximate the real behavior, they need to communicate among each other.
The detailed mechanism behind this synchronization will be further explained in section \ref{sec:dbiplayer_functionality}.
This communication, however, brings up the necessity to containerize the thread players into a single module that can directly be connected to DRAMSys.
To achieve this, a new generic initiator interface was developed that makes it possible to connect components to DRAMSys whose internal architecture can be arbitrary.
In the case of the DbiPlayer, an additional interconnect module will bundle up all \texttt{simple\_initiator\_sockets} to a single \texttt{multi\_passthrough\_initiator\_socket} as presented in Figure \ref{fig:dbiplayer_without_caches}.
\begin{figure}
\begin{center}
\tikzfig{img/without_caching}
\caption{Architecture of the DbiPlayer without caches.}
\label{fig:dbiplayer_without_caches}
\end{center}
\end{figure}
As the memory accesses are directly extracted from the executed instructions, simply sending a transaction to the DRAM subsystem for every data reference would neglect the caches todays processors completely.
Therefore, also a cache model is required whose implementation will be explained in more detail in section \ref{sec:cache}.
Modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache.
(vlt hier Literaturreferenz)
This hierarchy is also reflected in the DbiPlayer as shown in Figure \ref{fig:dbiplayer_with_caches}.
\begin{landscape}
\begin{figure}
\begin{center}
\tikzfig{img/with_caching}
\caption{Architecture of the DbiPlayer with caches.}
\label{fig:dbiplayer_with_caches}
\end{center}
\end{figure}
\end{landscape}
\subsection{DbiPlayer Functionality}
\label{sec:dbiplayer_functionality}
With the overall architecture of the initiator introduced, this section explains the internal functionality of the DbiPlayer and its threads.
As mentioned previously, the threads cannot run by themself, rather they require synchronization to ensure the simulated system replicates the real running application as good as possible.
The analysis tool appends timestamps into the memory access traces that will be used to pause the execution of a thread, when the global time has not yet reached this far yet, or to advance the global time, when the thread is allowed to run.
It is to note that the term global time in this context does not correspond to the SystemC simulation time but denotes a loose time variable that the DbiPlayer uses to schedule its threads.
A set of rules determine if a thread is allowed to make progress beyond a timestamp that is further than the current global time:
\begin{enumerate}
\item The main thread at the start of the program is always allowed to run.
\item Threads don't go to sleep when they would produce a deadlock. This is the case when they are the only thread currently running.
\item When a previous running thread exits and all other threads are sleeping, then they will be woken up.
\item As a fallback, when currently all threads are waiting, one thread will be woken up.
\end{enumerate}
Those rules ensure that always at least one thread is running and the simulation does not come to a premature halt.
bla bla zu instruction count und clk