Non-blocking caches

This commit is contained in:
2022-05-25 19:13:05 +02:00
parent 696b2b05d2
commit b8d75bf8f1
8 changed files with 237 additions and 19 deletions

View File

@@ -10,10 +10,11 @@ Therefore caches, whose goal is to decrease the latency and increase the bandwid
Caches are faster than DRAM, but only provide a small capacity, as the per-bit cost is larger.
For this reason, at least the \textit{working set}, the data that the currently running application is working on, should be stored in the cache.
The two most important heuristics that make this possible will be explained in section \ref{sec:caches_locality_principles}.
After that the typical structure of a cache will be discussed in \ref{sec:caches_logical_organization}, followed by the considerations to make when it comes to virtual addressing in section \ref{sec:caches_virtual_addressing}.
After that the typical structure of a cache will be discussed in \ref{sec:caches_logical_organization}.
Replacement policies will be explained in \ref{sec:replacement_policies} and write policies in \ref{sec:write_policies}, followed by the considerations to make when it comes to virtual addressing in section \ref{sec:caches_virtual_addressing}.
Finally, the advantage of non-blocking caches is the topic of section \ref{sec:caches_non_blocking_caches}.
TODO update
\subsection{Locality Principles}
\label{sec:caches_locality_principles}
@@ -24,18 +25,17 @@ Those two heuristics are called \textit{temporal locality} and \textit{spatial l
\subsubsection{Temporal Locality}
Referenced data is likely to be referenced again by the application in the future.
This is most important characteristic that make it possible for a cache to optimize the access latency.
When new data is referenced, it will be fetched from the main memory and kept in the cache.
Operations using this data can now perform calculations and use the end result further by only accessing the cache, exploiting this tendency of the application.
Temporal locality is the concept of referenced data being likely to be referenced again in the near future.
Taking advantage of this is the main idea behind a cache:
When new data is referenced, it will be read from the main memory and buffered in the cache.
The processor can now perform operations on this data and use its end result further without needing to access the main memory.
\subsubsection{Spatial Locality}
Programs have a tendency to reference data that is nearby already referenced data in the memory space.
This is because related data is often clustered together, for example in arrays or structures.
Programs have a tendency to reference data that is nearby in the memory space of already referenced data.
This tendency, spatial locality, arises because related data is often clustered together, for example in arrays or structures.
When calculations are performed on those arrays, sequential access patterns can be observed as one element is processed after the other.
This tendency can be exploited by organizing blocks of data in so called \textit{cache blocks} or \textit{cache lines} which are larger than a single data word.
Spatial locality can be exploited by organizing blocks of data in so called \textit{cache blocks} or \textit{cache lines} which are larger than a single data word.
This is a passive form of making use of spatial locality, as referenced data will also cause nearby words to be loaded into the same cache line, making them available for further accesses.
An active form of exploiting spatial locality is the use of \textit{prefetching}.
@@ -97,9 +97,9 @@ To determine which one of the corresponding set, there are several replacement p
The \revabbr{least recently used}{LRU} policy selects the cache line whose last usage is the longest time ago.
A LRU algorithm is expensive to implement, a counter value for every cache line of a set has to be updated every time the set is accessed.
\item
An alternative is a \revabbr{Pseudo-LRU}{PLRU} policy, where one bit is set to 1 every time a cache line is accessed.
An alternative is a \revabbr{pseudo LRU}{PLRU} policy, where an extra bit is set to 1 every time a cache line is accessed.
When the extra bit of every cache line in a set is set to 1, they will get reset to 0.
In case of an contention, the first cache line whose extra bit is 0 will be evicted, which indicates that the last usage was some time ago.
In case of contention, the first cache line whose extra bit is 0 will be evicted, which indicates that the last usage was likely some time ago.
\item
In the \revabbr{least frequently used}{LFU} policy, every time a cache line is accessed, a counter value will be increased.
The cache line with the lowest value, the least frequently used one, will be chosen to be evicted.
@@ -122,9 +122,63 @@ Also here, a write buffer can be used to place the actual write back requests in
\subsection{Virtual Addressing}
\label{sec:caches_virtual_addressing}
Operating systems use virtual addressing to isolate the memory spaces of user space programs from each other, giving each process an own virtal address space.
\textit{Virtual addresses} are composed of a \textit{virtual page number} and a \textit{page offset}.
The virtual page number is the actual part that is virtual, the page offset is the same for the virtual and the physical address.
Figure \ref{fig:virtual_address} shows an exemplary division of a virtual address into its components.
\begin{figure}[!ht]
\begin{center}
\tikzfig{img/virtual_address}
\caption{Exemplary division of the virtual address into a virtual page number and page offset.}
\label{fig:virtual_address}
\end{center}
\end{figure}
Before a process can access a specific region in memory, the kernel has to translate the virtual page number into a physical page number.
For conversions, so called \textit{page tables} are used to look up the physical page number.
Page tables are usually multiple levels deep (e.g. 4-levels on x86), so a single conversion can cause up to 4 memory accesses, which is expensive.
To improve performance, a \revabbr{translation lookaside buffer}{TLB} is used that acts like a cache on its own for physical page numbers.
However, as long as the physical address is not present, the data cache cannot lookup its entries as the index is not known yet.
So the cache has to wait on the TLB, or worse on multiple memory accesses.
To circuumvent this problem, the cache can be indexed by the virtual address what makes it possible to parallize both procedures.
Such a cache is called \textit{virtually indexed} and \textit{physically tagged} and is illustrated in figure \ref{fig:virtual_address_conversion}.
% Ist die Darstellung aus dem Buch richtig? Sollte der Cache Index wirklich über den Page Offset hinaus gehen?
\begin{figure}[!ht]
\begin{center}
\tikzfig{img/virtual_address_conversion}
\caption{Virtually indexed, physically tagged cache.\cite{Jacob2008} ASID refers to address-space identifier.}
\label{fig:virtual_address_conversion}
\end{center}
\end{figure}
The result from the TLB, the physical page number, needs to be compared to tag that is stored in the cache.
When the tag and the physical page number match, then the cache entry is valid for this virtual address.
Note that when the cache index is completely contained in the page offset, another problem called \textit{aliasing} is resolved, which will not further be discussed in this thesis.
\subsection{Non-blocking Caches}
\label{sec:caches_non_blocking_caches}
In blocking caches, cache misses require the processor to stall until the data is fetched from the underlying memory.
As this is a major slowdown, non-blocking caches try to solve this problem, making it possible for the processor to make further progress while waiting on the value.
Similarly to the write buffer, previously discussed in \ref{sec:write_policies}, a new buffer will be introduced: the \revabbr{miss status hold register}{MSHR}.
The number of MSHRs correspond to the number of misses the cache can handle concurrently; when all available MSHRs are occupied and a further miss occurs, the cache will block.
A MSHR entry always corresponds to one cache line that is currently being fetched from the underlying memory subsystem.
There are two variants of cache misses:
\textit{Primary misses} are misses that lead to another occupation of a MSHR, where as \textit{secondary misses} are added to an existing MSHR entry and therefore cannot cause the cache to block.
This is the case when the same cache line as accessed.
An architecture of a MSHR file is illustrated in figure \ref{fig:mshr_file}.
\begin{figure}[!ht]
\begin{center}
\tikzfig{img/mshr_file}
\caption{Miss Holding Status Register File.\cite{Jahre2007} V refers to a valid bit.}
\label{fig:mshr_file}
\end{center}
\end{figure}