diff --git a/inc/4.caches.tex b/inc/4.caches.tex index 4021b5a..5e572ef 100644 --- a/inc/4.caches.tex +++ b/inc/4.caches.tex @@ -81,10 +81,10 @@ However, every time new data is referenced that gets placed into the same set, t This leads to an overall lower cache hit rate as the other two policies. In a fully associative cache, a memory reference can be placed anywhere, consequently all cache lines have to be fetched and compared to the tag. -Although this policy has the highest potential cache hit rate, the high space consumption due to comparators and high power consumption due to the lookup process, makes it non-feasable for many systems. +Although this policy has the highest potential cache hit rate, the high space consumption due to comparators and high power consumption due to the lookup process, makes it non-feasible for many systems. The hybrid approach of set-associative caches offers a trade-off between both policies. -The term \textit{associtativity} denotes the number of cache lines that are contained in a set. +The term \textit{associativity} denotes the number of cache lines that are contained in a set. \subsection{Replacement Policies} \label{sec:replacement_policies} @@ -127,7 +127,7 @@ Also here, a write buffer can be used to place the actual write back requests in \subsection{Virtual Addressing} \label{sec:caches_virtual_addressing} -Operating systems use virtual addressing to isolate the memory spaces of user space programs from each other, giving each process an own virtal address space. +Operating systems use virtual addressing to isolate the memory spaces of user space programs from each other, giving each process an own virtual address space. \textit{Virtual addresses} are composed of a \textit{virtual page number} and a \textit{page offset}. The virtual page number is the actual part that is virtual, the page offset is the same for the virtual and the physical address. @@ -148,7 +148,7 @@ To improve performance, a \revabbr{translation lookaside buffer}{TLB} is used th However, as long as the physical address is not present, the data cache cannot lookup its entries as the index is not known yet. So the cache has to wait on the TLB, or worse on multiple memory accesses. -To circuumvent this problem, the cache can be indexed by the virtual address what makes it possible to parallize both procedures. +To circumvent this problem, the cache can be indexed by the virtual address what makes it possible to parallelize both procedures. Such a cache is called \textit{virtually indexed} and \textit{physically tagged} and is illustrated in figure \ref{fig:virtual_address_conversion}. % Ist die Darstellung aus dem Buch richtig? Sollte der Cache Index wirklich über den Page Offset hinaus gehen? diff --git a/inc/5.dramsys.tex b/inc/5.dramsys.tex index 644a458..0e57b57 100644 --- a/inc/5.dramsys.tex +++ b/inc/5.dramsys.tex @@ -18,7 +18,7 @@ The general architecture of DRAMSys is illustrated in figure \ref{fig:dramsys}. Several initiators can be connected to the arbiter, sending requests to the DRAM subsystem. An initiator can either be a sophisticated processor model like the gem5 out of order processor model \cite{Binkert2011} or a trace player that simply replays a trace file containing a sequence of memory requests and timestamps. -To support a large variety of DRAM standards robustly and error-free, DRAMSys uses a formal domain specific language based on petri nets called DRAMml. +To support a large variety of DRAM standards robustly and error-free, DRAMSys uses a formal domain specific language based on Petri nets called DRAMml. This language includes a standards timing dependencies between all DRAM commands and compiles to source code of the internal timing checkers that ensure compliance to the specific standard \cite{Jung2017a}. Because a single memory access can cause the issuance of multiple commands (e.g. precharge (\texttt{PRE}), activate (\texttt{ACT}), read (\texttt{RD}) or write (\texttt{WR})), the four phase handshake of the TLM-AT protocol is not sufficient enough. @@ -26,7 +26,7 @@ Therefore, a custom TLM protocol called DRAM-AT is used as the communication pro % Evtl TA falls Bilder genutzt werden? DRAMSys also provides the so-called \textit{Trace Analyzer}, a graphical tool that visualizes database files created by DRAMSys. -It shows the \texttt{REQ} and \texttt{RESP} phases between the initiator and the arbiter, the occupation of the command bus and data bus as well as represenstations of the different phases in the DRAM banks. +It shows the \texttt{REQ} and \texttt{RESP} phases between the initiator and the arbiter, the occupation of the command bus and data bus as well as representations of the different phases in the DRAM banks. An example trace database, visualized in the Trace Analyzer is shown in figure \ref{fig:traceanalyzer}. Furthermore, the Trace Analyzer is capable of calculating numerous metrics and creating plots of interesting characteristics. diff --git a/inc/6.implementation.tex b/inc/6.implementation.tex index 41844f9..103ed9b 100644 --- a/inc/6.implementation.tex +++ b/inc/6.implementation.tex @@ -49,13 +49,13 @@ A \texttt{memref\_t} can either represent an instruction, a data reference or a Besides of the type, the \revabbr{process identifier}{PID} and \revabbr{thread identifier}{TID} of the initiating process and thread is included in every record. For an instruction marker, the size of the instruction as well as the virtual address of the instruction in the memory map is provided. For data references, the address and size of the desired access is provided as well the \revabbr{program counter}{PC} from where it was initiated. -In offline mode, DrCacheSim stores the current mapping of all binary executables and shared libraries in a seperate file, so that it is possible to decode named instructions even after the application has exited. +In offline mode, DrCacheSim stores the current mapping of all binary executables and shared libraries in a separate file, so that it is possible to decode named instructions even after the application has exited. In case of online tracing, the analyzer has to inspect the memory of the client-side process for this. Analysis tools implement the \texttt{analysis\_tool\_t} interface as this enables the analyzer to forward a received record to multiple tools in a polymorphic manner. In particular, the \texttt{process\_memref\_t()} method of any tool is called for every incoming record. -The newly developed DRAMTracer tool creates for every thread of the application a seperate trace file. +The newly developed DRAMTracer tool creates for every thread of the application a separate trace file. As it is not known how many threads an application will spawn, the tool will listen for records with new TIDs that it did not register yet. For every data reference, a new entry in the corresponding trace file is made which contains the size and the physical address of the access, whether it was a read or write, and also a count of (computational) instructions that have been executed since the last reference. This instruction count is used to approximate the delay between the memory accesses when the trace is replayed by DRAMSys. @@ -105,8 +105,7 @@ For the DbiPlayer, an additional interconnect module will bundle up all \\ \text As the memory accesses are directly extracted from the executed instructions, simply sending a transaction to the DRAM subsystem for every data reference would neglect the caches of today's processors completely. Therefore, also a cache model is required whose implementation will be explained in more detail in section \ref{sec:cache_implementation}. -Modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache. -% (vlt hier Literaturreferenz) +Many modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache. This hierarchy is also reflected in the DbiPlayer as shown in Figure \ref{fig:dbiplayer_with_caches}. \begin{landscape} @@ -179,7 +178,9 @@ It is to note that the current implementation does not utilize a snooping protoc Therefore, cache coherency is not guaranteed and memory shared between multiple processor cores will result in incorrect results as the values are not synchronized between the caches. However, it is to expect that this will not drastically affect the simulation results for applications with few shared resources. The implementation of a snooping protocol is a candidate for future improvements. -%However, it is to expect that this will not drastically affect the simulation results. -\subsection{A New Trace Player Interface} +\subsection{Trace Player Interface} \label{sec:traceplayer_interface} + +\subsection{Interconnect} +\label{sec:interconnect}