39 lines
3.6 KiB
TeX
39 lines
3.6 KiB
TeX
\section{Introduction}
|
|
\label{sec:introduction}
|
|
%vlt noch warum DRAMs immer mehr eingesetzt werden
|
|
Today's computing systems accompany us in almost all areas of life in the form of smart devices, computers, or game consoles.
|
|
With the increasing performance requirements on these devices, not only faster processors are needed, but also high-performance memory systems, namely \revabbr{dynamic random-access memories}{DRAMs}, which are supposed to deliver a lot of bandwidth at a low latency.
|
|
While these storage systems are very complex and offer a lot of room for configuration, e.g., the DRAM standard, the memory controller configuration or the address mapping, there are different requirements for the very different applications \cite{Gomony2012}.
|
|
Consequently, system designers are entrusted with the complex task of finding the most effective configurations that match the performance and power contraints with good optimizations applied for the specific use case.
|
|
|
|
\input{img/thesis.tikzstyles}
|
|
\begin{figure}[!ht]
|
|
\begin{center}
|
|
\tikzfig{img/cloud}
|
|
\caption{Exemplary DRAM configuration parameters to consider when designing a system.}
|
|
\label{fig:cloud}
|
|
\end{center}
|
|
\end{figure}
|
|
|
|
For the exploration of the design space of these configurations, it is impractical to use real systems as they are too cost-intensive and not modifyable and therefore not suitable for rapid prototyping.
|
|
To overcome this limitation, it is important to simulate the memory system using a simulation framework with sufficient accuracy.
|
|
|
|
Such a simulation framework is DRAMSys \cite{Steiner2020, Jung2017}, which is based on SystemC \revabbr{transaction level modeling}{TLM} and enables the fast simulation of numerous DRAM standards and controller configurations with cycle-accuracy.
|
|
Stimuli for the memory system can either be generated using a prerecorded trace file with timestamps, a traffic generator that acts as a state machine and initiates different request patterns or a detailed processor model of the gem5 \cite{Binkert2011} simulation framework.
|
|
|
|
However, the two former methods lack in accurary whereas the latter may provide the sufficient precision but is a very time-consuming effort.
|
|
To fill this gap of fast but accurate traffic generation, a new simulation frontend for DRAMSys is developed and presented in this thesis.
|
|
|
|
The methology this new framework is based on is dynamic binary instrumentation.
|
|
It allows the extraction of memory accesses of multi-threaded applications while they are executed on real hardware.
|
|
These memory access traces are then played back using a simplified core model and are filtered by a cache model before the memory requests are passed to the DRAM.
|
|
This allows an accurate modeling of the system and the variation of numerous configuration parameters in a short time.
|
|
|
|
The remainder of the thesis is structured as follows:
|
|
In Section \ref{sec:dynamorio} the used dynamic binary instrumentation framework, DynamoRIO, is introduced.
|
|
Section \ref{sec:systemc} presents the modeling language SystemC, on which the developed core and cache models are based on.
|
|
After that, Section \ref{sec:caches} gives a short overview of modern cache architectures and their high-level implementations.
|
|
Section \ref{sec:dramsys} introduces the DRAMSys simulation framework and its basic functionalities.
|
|
Section \ref{sec:implementation} explains the implementation of the cache model, the processor model and the instrumentation tool in detail.
|
|
In Section \ref{sec:simulation_results} the accuracy of the new framework is compared against the gem5 and Ramulator \cite{Kim2016} simulators, whereas Section \ref{sec:future_work} finally denotes future improvements that can be achieved.
|