\section{DynamoRIO}
\label{sec:dynamorio}

This section will give a short overview of the dynamic binary instrumentation tool DynamoRIO, which will be used throughout this thesis.
The explained topics are mainly based on the chapters \textit{``DynamoRIO''}, \textit{``Code Cache''} and \textit{``Transparency''} of \cite{Bruening2004} as well as on \cite{Bruening2003}.

\subsection{Dynamic Binary Instrumentation}
\label{sec:dbi}

\revabbr{Dynamic binary instrumentation}{DBI} is a method to analyze, profile, manipulate and optimize the behavior of a binary application while it is executed.
This is achieved through the injection of additional instructions into the instruction trace of the target application, which either accumulate statistics or intervene the instruction trace.

In comparison, debuggers use special breakpoint instructions (e.g. INT3 on x86 or BKPT on ARM) that are injected at specific places in the code, raising a debug exception when reaching it.
At those exceptions a context switch to the operating system kernel will be performed.
However, those context switches result in a significant performance penalty as the processor state has to be saved and restored afterwards, making it less efficient than DBI.

DBI tools can either invoke the target application by themselfes or are attached to the application's process dynamically.
The former method allows instrumentation of even the early startup stage of the application whereas the latter method might be used if the application has to be first brought into a certain state or the process cannot be restarted due to reliability reasons.
Some DBI tools also allow to directly implement the DBI framework into the applications source code.
While this removes the flexibility of observing applications that are only available in binary form, it enables the control over the DBI tool using its application interface.
With this method, it is possible to precisely instrument only a specific code region of interest and otherwise disable the tool for performance reasons.

In all cases, the instrumentation tool executes in the same process and address space as the target application.
While this enables great control of the DBI tool over the target application, it becomes important that the tool operates transparently, meaning that the application's behavior is not affected in unintended ways.
This is a special challenge as the instrumentation tool as well as the user-written instrumentation clients are not allowed to use library routines for memory operations/allocation, synchronization or input/output buffering that interfere with the target application \cite{Bruening2003}.
In particular, this is the case with library routines that are not \textit{reentrant}, which means they are unsafe to call concurrently.
The dispatcher of the DBI tool can run in arbitrary places, also during non-reentrant routines.
When the instrumentation tool or user-written client calls the same non-reentrant routine concurrently, undefined behavior would be the consequence.

Although it is evident, the user-written client should make no assumptions on the running system's behavior and should restore all modified registers and processor states unless it is an intentional interference with the application.
Most DBI tools offer the use of two distinct methods of injecting user code into the applications trace; in one case, the framework saves all relevant registers and flags by itself and dispatches the execution to a user-defined function.
This is the easiest method, but comes at the cost of the described context switch.
The more advanced approach is the injection of few but sufficient instructions directly into the applications instruction trace.
Here, it is the responsibility of the user to save and restore all altered states.

Generally speaking, the application should have no possibility to be able to detect that it is being instrumented by a DBI tool and should execute the same way as it would do normally, even when the application itself commits incorrect behavior such as accessing invalid memory regions.

In summary, dynamic code analysis has the full runtime information available, unlike static code analysis, which cannot predict the execution path of the program.
So DBI can be a mature choice for examining the runtime behavior of a binary application in a performant way.

The following Section \ref{sec:dynamorio_core} will explain how the core functionality of the DBI tool DynamoRIO works.

\subsection{Core Functionality}
\label{sec:dynamorio_core}

A simple way to observe and potentially modify the instructions of an application during execution is the use of an interpretation engine that emulates the binary executable in its entirety.
One widely used framework that uses this technique is for example Valgrind \cite{Valgrind}.
At its core, Valgrind uses a virtual machine and just-in-time compilation to instrument the target application.
This approach might be powerful, but it comes at the cost of significantly reduced performance.

DynamoRIO, on the other hand, uses a so-called \textit{code cache} where \textit{basic blocks} are copied into prior to execution.
A basic block is a sequence of instructions extracted from the target application's binary that end with a single control transfer instruction.
In the code cache, the instrumentation instructions will directly be inserted.

To be able to execute the modified code, basic blocks in the code cache are extended by two \textit{exit stubs}, ensuring that at the end the control is transferred back to DynamoRIO via a context switch.
From there the applications and processor state is saved and the next basic block will be copied into the code cache, modified and executed after restoring the previously saved state.
Basic blocks that are already located in the code cache are directly executed without copying, however, a context switch is still needed to determine the next basic block to execute.

To reduce this overhead and avoid a context switch, DynamoRIO can \textit{link} two basic blocks together that were targeted by a direct branch, i.e., branches whose target address will not change during runtime.
To achieve this, the target address has to be converted in-place to point to the new address in the code cache and not the original one in the mapped binary executable.
For indirect branches, i.e., branches whose target address is calculated at runtime, it is not possible to link them as their target basic blocks may vary.
However, basic blocks that are often executed in a sequence are merged into a \textit{trace}.
At the end of each basic block, an additional check is performed to determine if the indirect branch target will stay in the same trace, possibly preventing the context switch.
Those regularly executed parts of the application code are also referred to as \textit{hot code} and their optimization using traces improves the performance but introduces the minor disadvantage of multiple copies of the same basic block in the code cache.
The generic term for a basic block or a trace is \textit{fragment}.

Figure \ref{fig:dynamorio} illustrates the internal architecture and functionality of DynamoRIO.
The application code is loaded by the dispatcher, modified by the basic block builder, copied into the code cache and finally executed from there.

\input{img/thesis.tikzstyles}
\begin{figure}
\begin{center}
\tikzfig{img/dynamorio}
\caption{DynamoRIO runtime code manipulation layer \cite{Bruening2004}.}
\label{fig:dynamorio}
\end{center}
\end{figure}

As mentioned in Section \ref{sec:dbi}, it is important for a DBI tool to operate transparently.
DynamoRIO takes a number of measures to achieve this goal, some of which will now be explained \cite{Bruening2004}.
As sharing libraries with the target application can cause transparency issues, especially when using non-reentrant routines or routines that alter static state such as error codes, DynamoRIO directly interfaces with the system using system calls and even avoids to use the C standard library (e.g., \textit{glibc} on Linux).
The same should also apply for user-written instrumentation clients (introduced in more detail in Section \ref{sec:dynamorio_client}), but the direct usage of system calls is discouraged as this bypasses the internal monitoring of DynamoRIO for changes that affect the processes address space.
Instead, DynamoRIO provides a cross-platform API for generic routines as file system operations and memory allocation.
To guarantee thread transparency, DynamoRIO does not spawn new threads by itself, but uses the application threads instead and creates one DynamoRIO context for each.
When an instrumentation client needs to spawn threads, they should be hidden from introspection of the application.
Client code should also not alter the application stack in any way, as some specialized applications access data beyond the top of the stack.
Alternatively, DynamoRIO provides a separate stack that should be used instead to store temporary data.
To remain undetected, it is also required for DynamoRIO to protect its own memory from malicious reads or writes from the application.
Those should, like in the native case, raise an exception as unallocated data is accessed.
However, as these memory regions are actually allocated, DynamoRIO has to produce those execption itself to remain transparent.
When the application branches to a dynamically calculated address, DynamoRIO has to translate this address to the corresponding address of the basic block in the code cache.
But also in the backward case, whenever a code cache address is exposed to the application, it has to be converted back to the corresponding address to the mapped address region of the binary executable.

As it can be seen, DynamoRIO makes significant effort to ensure transparency.
However, factors such as timing deviations cannot be taken into account, since the instrumentation code consists of additional instructions that must be executed.
So a sophisticated application could try to detect the presence of an instrumentation tool by estimating and comparing the execution time of its own routines.

\subsection{Clients}
\label{sec:dynamorio_client}

With the inner workings introduced so far, the presence of DynamoRIO does not have an effect other than that the application is executed from the code cache.
DynamoRIO provides a programming interface to develop external so-called \textit{clients} \cite{Bruening2004}.
Clients are user-written instrumentation tools and make it possible to dynamically modify the basic blocks, either to alter the application behavior or to insert observational instructions.
A DynamoRIO client is compiled into a shared library and passed to the \textit{drrun} utility using a command line option.
Clients implement a number of hook functions that will be called by DynamoRIO for certain events such as the creation of a basic block or of a trace.
Generally, there are two classes of hooks: those that execute on basic block creation instrument all of the application code and those that execute on trace generation are only interested in frequently executed code.
It is important to note that the hooks for basic block and trace generation are not called every time when this code sequence is executed, but only when these basic blocks are generated and placed into the code cache.
So the required instructions have to be inserted into the basic block instruction stream in this stage, rather than implementing the observational or manipulative behavior in the hook function itself.

Table \ref{tab:dynamorio_api} lists some of the most important hooks that a client can implement.

\begin{table}
\caption{Client routines that get called by DynamoRIO \cite{Bruening2003}.}
\begin{center}
\begin{tabular}{|p{0.55\linewidth} | p{0.35\linewidth}|}
 \hline
 Client Routine             & Description\\
 \hline
 \hline
 void dynamorio\_init()     & Client initialization\\
 \hline
 void dynamorio\_exit()     & Client finalization\\
 \hline
 void dynamorio\_thread\_init(void *context)     & Client per-thread initialization\\
 \hline
 void dynamorio\_thread\_exit(void *context)     & Client per-thread finalization\\
 \hline
 void dynamorio\_basic\_block(void *context, app\_pc tag, IntrList *bb)     & Client processing of basic block\\
 \hline
 void dynamorio\_trace(void *context, app\_pc tag, IntrList *trace)     & Client processing of trace\\
 \hline
 void dynamorio\_fragment\_deleted(void *context, app\_pc tag)     & Notifies client when a fragment is deleted from the code cache\\
 \hline
 void dynamorio\_end\_trace(void *context, app\_pc trace\_tag, app\_pc next\_tag)     & Asks client whether to end the current trace\\
 \hline
\end{tabular}
\end{center}
\label{tab:dynamorio_api}
\end{table}

Most of the hooks receive a \texttt{void *context} pointer to the thread-local machine context through its parameter list, which then needs to be passed to the code manipulation routines.
Those routines are available through DynamoRIO's rich code manipulation API that enables the generation, the encoding and the decoding of instructions.
Since the processor's flag and general purpose registers might be altered by executing those new instructions, it is necessary to store them before and restoring them after execution to guarantee transparency.
DynamoRIO also provides client routines to store those flags and registers in thread-local slots.
An alternative to manually storing and restoring are, as previously mentioned in Section \ref{sec:dbi}, so-called \textit{clean calls} where DynamoRIO takes the responsibility for storing and restoring the processor's state.
The clean call then dispatches to a user-defined function that will be run every time the basic block executes by modifying the program counter.
This comes at the great advantage of not having to implement the observational or manipulative behavior using assembly instructions; instead the compiler of the client takes care of converting the clean call function into machine code.
However, since DynamoRIO can not know which registers have to be stored as this depends on the user code, it has to preserve the whole processors state.
The dispatching to the clean call function is essentially a context switch and therefore has a great impact on the performance.
So it is up to the user to decide whether the gain in performance by avoiding clean calls outweighs the higher development effort.

An exemplary client that already comes with DynamoRIO is \textit{DrCacheSim}.
Together with the \textit{DrMemtrace-Framework}, this client provides an easy way to trace the executed instructions of the application and the memory accesses it makes.
This framework will be further explained in Section \ref{sec:analysis_tool}.