Non-blocking caches
This commit is contained in:
@@ -10,10 +10,11 @@ Therefore caches, whose goal is to decrease the latency and increase the bandwid
|
||||
|
||||
Caches are faster than DRAM, but only provide a small capacity, as the per-bit cost is larger.
|
||||
For this reason, at least the \textit{working set}, the data that the currently running application is working on, should be stored in the cache.
|
||||
|
||||
The two most important heuristics that make this possible will be explained in section \ref{sec:caches_locality_principles}.
|
||||
After that the typical structure of a cache will be discussed in \ref{sec:caches_logical_organization}, followed by the considerations to make when it comes to virtual addressing in section \ref{sec:caches_virtual_addressing}.
|
||||
After that the typical structure of a cache will be discussed in \ref{sec:caches_logical_organization}.
|
||||
Replacement policies will be explained in \ref{sec:replacement_policies} and write policies in \ref{sec:write_policies}, followed by the considerations to make when it comes to virtual addressing in section \ref{sec:caches_virtual_addressing}.
|
||||
Finally, the advantage of non-blocking caches is the topic of section \ref{sec:caches_non_blocking_caches}.
|
||||
TODO update
|
||||
|
||||
\subsection{Locality Principles}
|
||||
\label{sec:caches_locality_principles}
|
||||
@@ -24,18 +25,17 @@ Those two heuristics are called \textit{temporal locality} and \textit{spatial l
|
||||
|
||||
\subsubsection{Temporal Locality}
|
||||
|
||||
Referenced data is likely to be referenced again by the application in the future.
|
||||
This is most important characteristic that make it possible for a cache to optimize the access latency.
|
||||
When new data is referenced, it will be fetched from the main memory and kept in the cache.
|
||||
Operations using this data can now perform calculations and use the end result further by only accessing the cache, exploiting this tendency of the application.
|
||||
Temporal locality is the concept of referenced data being likely to be referenced again in the near future.
|
||||
Taking advantage of this is the main idea behind a cache:
|
||||
When new data is referenced, it will be read from the main memory and buffered in the cache.
|
||||
The processor can now perform operations on this data and use its end result further without needing to access the main memory.
|
||||
|
||||
\subsubsection{Spatial Locality}
|
||||
|
||||
Programs have a tendency to reference data that is nearby already referenced data in the memory space.
|
||||
This is because related data is often clustered together, for example in arrays or structures.
|
||||
Programs have a tendency to reference data that is nearby in the memory space of already referenced data.
|
||||
This tendency, spatial locality, arises because related data is often clustered together, for example in arrays or structures.
|
||||
When calculations are performed on those arrays, sequential access patterns can be observed as one element is processed after the other.
|
||||
|
||||
This tendency can be exploited by organizing blocks of data in so called \textit{cache blocks} or \textit{cache lines} which are larger than a single data word.
|
||||
Spatial locality can be exploited by organizing blocks of data in so called \textit{cache blocks} or \textit{cache lines} which are larger than a single data word.
|
||||
This is a passive form of making use of spatial locality, as referenced data will also cause nearby words to be loaded into the same cache line, making them available for further accesses.
|
||||
|
||||
An active form of exploiting spatial locality is the use of \textit{prefetching}.
|
||||
@@ -97,9 +97,9 @@ To determine which one of the corresponding set, there are several replacement p
|
||||
The \revabbr{least recently used}{LRU} policy selects the cache line whose last usage is the longest time ago.
|
||||
A LRU algorithm is expensive to implement, a counter value for every cache line of a set has to be updated every time the set is accessed.
|
||||
\item
|
||||
An alternative is a \revabbr{Pseudo-LRU}{PLRU} policy, where one bit is set to 1 every time a cache line is accessed.
|
||||
An alternative is a \revabbr{pseudo LRU}{PLRU} policy, where an extra bit is set to 1 every time a cache line is accessed.
|
||||
When the extra bit of every cache line in a set is set to 1, they will get reset to 0.
|
||||
In case of an contention, the first cache line whose extra bit is 0 will be evicted, which indicates that the last usage was some time ago.
|
||||
In case of contention, the first cache line whose extra bit is 0 will be evicted, which indicates that the last usage was likely some time ago.
|
||||
\item
|
||||
In the \revabbr{least frequently used}{LFU} policy, every time a cache line is accessed, a counter value will be increased.
|
||||
The cache line with the lowest value, the least frequently used one, will be chosen to be evicted.
|
||||
@@ -122,9 +122,63 @@ Also here, a write buffer can be used to place the actual write back requests in
|
||||
\subsection{Virtual Addressing}
|
||||
\label{sec:caches_virtual_addressing}
|
||||
|
||||
Operating systems use virtual addressing to isolate the memory spaces of user space programs from each other, giving each process an own virtal address space.
|
||||
|
||||
\textit{Virtual addresses} are composed of a \textit{virtual page number} and a \textit{page offset}.
|
||||
The virtual page number is the actual part that is virtual, the page offset is the same for the virtual and the physical address.
|
||||
Figure \ref{fig:virtual_address} shows an exemplary division of a virtual address into its components.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\begin{center}
|
||||
\tikzfig{img/virtual_address}
|
||||
\caption{Exemplary division of the virtual address into a virtual page number and page offset.}
|
||||
\label{fig:virtual_address}
|
||||
\end{center}
|
||||
\end{figure}
|
||||
|
||||
Before a process can access a specific region in memory, the kernel has to translate the virtual page number into a physical page number.
|
||||
For conversions, so called \textit{page tables} are used to look up the physical page number.
|
||||
Page tables are usually multiple levels deep (e.g. 4-levels on x86), so a single conversion can cause up to 4 memory accesses, which is expensive.
|
||||
To improve performance, a \revabbr{translation lookaside buffer}{TLB} is used that acts like a cache on its own for physical page numbers.
|
||||
|
||||
However, as long as the physical address is not present, the data cache cannot lookup its entries as the index is not known yet.
|
||||
So the cache has to wait on the TLB, or worse on multiple memory accesses.
|
||||
To circuumvent this problem, the cache can be indexed by the virtual address what makes it possible to parallize both procedures.
|
||||
Such a cache is called \textit{virtually indexed} and \textit{physically tagged} and is illustrated in figure \ref{fig:virtual_address_conversion}.
|
||||
|
||||
% Ist die Darstellung aus dem Buch richtig? Sollte der Cache Index wirklich über den Page Offset hinaus gehen?
|
||||
\begin{figure}[!ht]
|
||||
\begin{center}
|
||||
\tikzfig{img/virtual_address_conversion}
|
||||
\caption{Virtually indexed, physically tagged cache.\cite{Jacob2008} ASID refers to address-space identifier.}
|
||||
\label{fig:virtual_address_conversion}
|
||||
\end{center}
|
||||
\end{figure}
|
||||
|
||||
The result from the TLB, the physical page number, needs to be compared to tag that is stored in the cache.
|
||||
When the tag and the physical page number match, then the cache entry is valid for this virtual address.
|
||||
Note that when the cache index is completely contained in the page offset, another problem called \textit{aliasing} is resolved, which will not further be discussed in this thesis.
|
||||
|
||||
\subsection{Non-blocking Caches}
|
||||
\label{sec:caches_non_blocking_caches}
|
||||
|
||||
In blocking caches, cache misses require the processor to stall until the data is fetched from the underlying memory.
|
||||
As this is a major slowdown, non-blocking caches try to solve this problem, making it possible for the processor to make further progress while waiting on the value.
|
||||
|
||||
Similarly to the write buffer, previously discussed in \ref{sec:write_policies}, a new buffer will be introduced: the \revabbr{miss status hold register}{MSHR}.
|
||||
The number of MSHRs correspond to the number of misses the cache can handle concurrently; when all available MSHRs are occupied and a further miss occurs, the cache will block.
|
||||
A MSHR entry always corresponds to one cache line that is currently being fetched from the underlying memory subsystem.
|
||||
|
||||
There are two variants of cache misses:
|
||||
\textit{Primary misses} are misses that lead to another occupation of a MSHR, where as \textit{secondary misses} are added to an existing MSHR entry and therefore cannot cause the cache to block.
|
||||
This is the case when the same cache line as accessed.
|
||||
|
||||
An architecture of a MSHR file is illustrated in figure \ref{fig:mshr_file}.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\begin{center}
|
||||
\tikzfig{img/mshr_file}
|
||||
\caption{Miss Holding Status Register File.\cite{Jahre2007} V refers to a valid bit.}
|
||||
\label{fig:mshr_file}
|
||||
\end{center}
|
||||
\end{figure}
|
||||
|
||||
Reference in New Issue
Block a user