Fix some typos
This commit is contained in:
@@ -81,10 +81,10 @@ However, every time new data is referenced that gets placed into the same set, t
|
|||||||
This leads to an overall lower cache hit rate as the other two policies.
|
This leads to an overall lower cache hit rate as the other two policies.
|
||||||
|
|
||||||
In a fully associative cache, a memory reference can be placed anywhere, consequently all cache lines have to be fetched and compared to the tag.
|
In a fully associative cache, a memory reference can be placed anywhere, consequently all cache lines have to be fetched and compared to the tag.
|
||||||
Although this policy has the highest potential cache hit rate, the high space consumption due to comparators and high power consumption due to the lookup process, makes it non-feasable for many systems.
|
Although this policy has the highest potential cache hit rate, the high space consumption due to comparators and high power consumption due to the lookup process, makes it non-feasible for many systems.
|
||||||
|
|
||||||
The hybrid approach of set-associative caches offers a trade-off between both policies.
|
The hybrid approach of set-associative caches offers a trade-off between both policies.
|
||||||
The term \textit{associtativity} denotes the number of cache lines that are contained in a set.
|
The term \textit{associativity} denotes the number of cache lines that are contained in a set.
|
||||||
|
|
||||||
\subsection{Replacement Policies}
|
\subsection{Replacement Policies}
|
||||||
\label{sec:replacement_policies}
|
\label{sec:replacement_policies}
|
||||||
@@ -127,7 +127,7 @@ Also here, a write buffer can be used to place the actual write back requests in
|
|||||||
\subsection{Virtual Addressing}
|
\subsection{Virtual Addressing}
|
||||||
\label{sec:caches_virtual_addressing}
|
\label{sec:caches_virtual_addressing}
|
||||||
|
|
||||||
Operating systems use virtual addressing to isolate the memory spaces of user space programs from each other, giving each process an own virtal address space.
|
Operating systems use virtual addressing to isolate the memory spaces of user space programs from each other, giving each process an own virtual address space.
|
||||||
|
|
||||||
\textit{Virtual addresses} are composed of a \textit{virtual page number} and a \textit{page offset}.
|
\textit{Virtual addresses} are composed of a \textit{virtual page number} and a \textit{page offset}.
|
||||||
The virtual page number is the actual part that is virtual, the page offset is the same for the virtual and the physical address.
|
The virtual page number is the actual part that is virtual, the page offset is the same for the virtual and the physical address.
|
||||||
@@ -148,7 +148,7 @@ To improve performance, a \revabbr{translation lookaside buffer}{TLB} is used th
|
|||||||
|
|
||||||
However, as long as the physical address is not present, the data cache cannot lookup its entries as the index is not known yet.
|
However, as long as the physical address is not present, the data cache cannot lookup its entries as the index is not known yet.
|
||||||
So the cache has to wait on the TLB, or worse on multiple memory accesses.
|
So the cache has to wait on the TLB, or worse on multiple memory accesses.
|
||||||
To circuumvent this problem, the cache can be indexed by the virtual address what makes it possible to parallize both procedures.
|
To circumvent this problem, the cache can be indexed by the virtual address what makes it possible to parallelize both procedures.
|
||||||
Such a cache is called \textit{virtually indexed} and \textit{physically tagged} and is illustrated in figure \ref{fig:virtual_address_conversion}.
|
Such a cache is called \textit{virtually indexed} and \textit{physically tagged} and is illustrated in figure \ref{fig:virtual_address_conversion}.
|
||||||
|
|
||||||
% Ist die Darstellung aus dem Buch richtig? Sollte der Cache Index wirklich über den Page Offset hinaus gehen?
|
% Ist die Darstellung aus dem Buch richtig? Sollte der Cache Index wirklich über den Page Offset hinaus gehen?
|
||||||
|
|||||||
@@ -18,7 +18,7 @@ The general architecture of DRAMSys is illustrated in figure \ref{fig:dramsys}.
|
|||||||
Several initiators can be connected to the arbiter, sending requests to the DRAM subsystem.
|
Several initiators can be connected to the arbiter, sending requests to the DRAM subsystem.
|
||||||
An initiator can either be a sophisticated processor model like the gem5 out of order processor model \cite{Binkert2011} or a trace player that simply replays a trace file containing a sequence of memory requests and timestamps.
|
An initiator can either be a sophisticated processor model like the gem5 out of order processor model \cite{Binkert2011} or a trace player that simply replays a trace file containing a sequence of memory requests and timestamps.
|
||||||
|
|
||||||
To support a large variety of DRAM standards robustly and error-free, DRAMSys uses a formal domain specific language based on petri nets called DRAMml.
|
To support a large variety of DRAM standards robustly and error-free, DRAMSys uses a formal domain specific language based on Petri nets called DRAMml.
|
||||||
This language includes a standards timing dependencies between all DRAM commands and compiles to source code of the internal timing checkers that ensure compliance to the specific standard \cite{Jung2017a}.
|
This language includes a standards timing dependencies between all DRAM commands and compiles to source code of the internal timing checkers that ensure compliance to the specific standard \cite{Jung2017a}.
|
||||||
|
|
||||||
Because a single memory access can cause the issuance of multiple commands (e.g. precharge (\texttt{PRE}), activate (\texttt{ACT}), read (\texttt{RD}) or write (\texttt{WR})), the four phase handshake of the TLM-AT protocol is not sufficient enough.
|
Because a single memory access can cause the issuance of multiple commands (e.g. precharge (\texttt{PRE}), activate (\texttt{ACT}), read (\texttt{RD}) or write (\texttt{WR})), the four phase handshake of the TLM-AT protocol is not sufficient enough.
|
||||||
@@ -26,7 +26,7 @@ Therefore, a custom TLM protocol called DRAM-AT is used as the communication pro
|
|||||||
|
|
||||||
% Evtl TA falls Bilder genutzt werden?
|
% Evtl TA falls Bilder genutzt werden?
|
||||||
DRAMSys also provides the so-called \textit{Trace Analyzer}, a graphical tool that visualizes database files created by DRAMSys.
|
DRAMSys also provides the so-called \textit{Trace Analyzer}, a graphical tool that visualizes database files created by DRAMSys.
|
||||||
It shows the \texttt{REQ} and \texttt{RESP} phases between the initiator and the arbiter, the occupation of the command bus and data bus as well as represenstations of the different phases in the DRAM banks.
|
It shows the \texttt{REQ} and \texttt{RESP} phases between the initiator and the arbiter, the occupation of the command bus and data bus as well as representations of the different phases in the DRAM banks.
|
||||||
An example trace database, visualized in the Trace Analyzer is shown in figure \ref{fig:traceanalyzer}.
|
An example trace database, visualized in the Trace Analyzer is shown in figure \ref{fig:traceanalyzer}.
|
||||||
Furthermore, the Trace Analyzer is capable of calculating numerous metrics and creating plots of interesting characteristics.
|
Furthermore, the Trace Analyzer is capable of calculating numerous metrics and creating plots of interesting characteristics.
|
||||||
|
|
||||||
|
|||||||
@@ -49,13 +49,13 @@ A \texttt{memref\_t} can either represent an instruction, a data reference or a
|
|||||||
Besides of the type, the \revabbr{process identifier}{PID} and \revabbr{thread identifier}{TID} of the initiating process and thread is included in every record.
|
Besides of the type, the \revabbr{process identifier}{PID} and \revabbr{thread identifier}{TID} of the initiating process and thread is included in every record.
|
||||||
For an instruction marker, the size of the instruction as well as the virtual address of the instruction in the memory map is provided.
|
For an instruction marker, the size of the instruction as well as the virtual address of the instruction in the memory map is provided.
|
||||||
For data references, the address and size of the desired access is provided as well the \revabbr{program counter}{PC} from where it was initiated.
|
For data references, the address and size of the desired access is provided as well the \revabbr{program counter}{PC} from where it was initiated.
|
||||||
In offline mode, DrCacheSim stores the current mapping of all binary executables and shared libraries in a seperate file, so that it is possible to decode named instructions even after the application has exited.
|
In offline mode, DrCacheSim stores the current mapping of all binary executables and shared libraries in a separate file, so that it is possible to decode named instructions even after the application has exited.
|
||||||
In case of online tracing, the analyzer has to inspect the memory of the client-side process for this.
|
In case of online tracing, the analyzer has to inspect the memory of the client-side process for this.
|
||||||
|
|
||||||
Analysis tools implement the \texttt{analysis\_tool\_t} interface as this enables the analyzer to forward a received record to multiple tools in a polymorphic manner.
|
Analysis tools implement the \texttt{analysis\_tool\_t} interface as this enables the analyzer to forward a received record to multiple tools in a polymorphic manner.
|
||||||
In particular, the \texttt{process\_memref\_t()} method of any tool is called for every incoming record.
|
In particular, the \texttt{process\_memref\_t()} method of any tool is called for every incoming record.
|
||||||
|
|
||||||
The newly developed DRAMTracer tool creates for every thread of the application a seperate trace file.
|
The newly developed DRAMTracer tool creates for every thread of the application a separate trace file.
|
||||||
As it is not known how many threads an application will spawn, the tool will listen for records with new TIDs that it did not register yet.
|
As it is not known how many threads an application will spawn, the tool will listen for records with new TIDs that it did not register yet.
|
||||||
For every data reference, a new entry in the corresponding trace file is made which contains the size and the physical address of the access, whether it was a read or write, and also a count of (computational) instructions that have been executed since the last reference.
|
For every data reference, a new entry in the corresponding trace file is made which contains the size and the physical address of the access, whether it was a read or write, and also a count of (computational) instructions that have been executed since the last reference.
|
||||||
This instruction count is used to approximate the delay between the memory accesses when the trace is replayed by DRAMSys.
|
This instruction count is used to approximate the delay between the memory accesses when the trace is replayed by DRAMSys.
|
||||||
@@ -105,8 +105,7 @@ For the DbiPlayer, an additional interconnect module will bundle up all \\ \text
|
|||||||
|
|
||||||
As the memory accesses are directly extracted from the executed instructions, simply sending a transaction to the DRAM subsystem for every data reference would neglect the caches of today's processors completely.
|
As the memory accesses are directly extracted from the executed instructions, simply sending a transaction to the DRAM subsystem for every data reference would neglect the caches of today's processors completely.
|
||||||
Therefore, also a cache model is required whose implementation will be explained in more detail in section \ref{sec:cache_implementation}.
|
Therefore, also a cache model is required whose implementation will be explained in more detail in section \ref{sec:cache_implementation}.
|
||||||
Modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache.
|
Many modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache.
|
||||||
% (vlt hier Literaturreferenz)
|
|
||||||
This hierarchy is also reflected in the DbiPlayer as shown in Figure \ref{fig:dbiplayer_with_caches}.
|
This hierarchy is also reflected in the DbiPlayer as shown in Figure \ref{fig:dbiplayer_with_caches}.
|
||||||
|
|
||||||
\begin{landscape}
|
\begin{landscape}
|
||||||
@@ -179,7 +178,9 @@ It is to note that the current implementation does not utilize a snooping protoc
|
|||||||
Therefore, cache coherency is not guaranteed and memory shared between multiple processor cores will result in incorrect results as the values are not synchronized between the caches.
|
Therefore, cache coherency is not guaranteed and memory shared between multiple processor cores will result in incorrect results as the values are not synchronized between the caches.
|
||||||
However, it is to expect that this will not drastically affect the simulation results for applications with few shared resources.
|
However, it is to expect that this will not drastically affect the simulation results for applications with few shared resources.
|
||||||
The implementation of a snooping protocol is a candidate for future improvements.
|
The implementation of a snooping protocol is a candidate for future improvements.
|
||||||
%However, it is to expect that this will not drastically affect the simulation results.
|
|
||||||
|
|
||||||
\subsection{A New Trace Player Interface}
|
\subsection{Trace Player Interface}
|
||||||
\label{sec:traceplayer_interface}
|
\label{sec:traceplayer_interface}
|
||||||
|
|
||||||
|
\subsection{Interconnect}
|
||||||
|
\label{sec:interconnect}
|
||||||
|
|||||||
Reference in New Issue
Block a user