Cache implementation

This commit is contained in:
2022-05-30 20:14:17 +02:00
parent 207e1c8c1c
commit d951a3a0d0
5 changed files with 62 additions and 204 deletions

View File

@@ -11,14 +11,14 @@
\node [style=align text] (8) at (-1.5, -6) {Time}; \node [style=align text] (8) at (-1.5, -6) {Time};
\node [style=none] (9) at (0, -1) {}; \node [style=none] (9) at (0, -1) {};
\node [style=none] (10) at (12, -1) {}; \node [style=none] (10) at (12, -1) {};
\node [style=none] (11) at (12, -2) {}; \node [style=none] (11) at (12, -2.25) {};
\node [style=none] (12) at (0, -2) {}; \node [style=none] (12) at (0, -2.25) {};
\node [style=none] (13) at (0, -6) {}; \node [style=none] (13) at (0, -6) {};
\node [style=none] (14) at (12, -6) {}; \node [style=none] (14) at (12, -6) {};
\node [style=none] (15) at (0, -10) {}; \node [style=none] (15) at (0, -10) {};
\node [style=none] (16) at (12, -10) {}; \node [style=none] (16) at (12, -10) {};
\node [style=none] (17) at (2.5, -0.5) {BEGIN\_REQ}; \node [style=none] (17) at (2.5, -0.5) {BEGIN\_REQ};
\node [style=none] (18) at (9.75, -1.5) {END\_RESP}; \node [style=none] (18) at (9.75, -1.75) {END\_REQ};
\node [style=none] (19) at (9.5, -5.5) {BEGIN\_RESP}; \node [style=none] (19) at (9.5, -5.5) {BEGIN\_RESP};
\node [style=none] (20) at (2.25, -9.5) {END\_RESP}; \node [style=none] (20) at (2.25, -9.5) {END\_RESP};
\node [style=none] (21) at (15.25, -6) {}; \node [style=none] (21) at (15.25, -6) {};

View File

@@ -9,6 +9,22 @@ They make it easier to test the product as VPs provide visiblity and controllabi
SystemC is a C++ class library with an event-driven simulation kernel, used for developing complex system models (i.e. VPs) in a high-level language. SystemC is a C++ class library with an event-driven simulation kernel, used for developing complex system models (i.e. VPs) in a high-level language.
It is defined under the IEEE 1666-2011 standard \cite{IEEE2012} and provided as an open-source library by Accellera. It is defined under the IEEE 1666-2011 standard \cite{IEEE2012} and provided as an open-source library by Accellera.
All SystemC modules inherit from the \texttt{sc\_module} base class.
Those modules can hierarchically be composed of other modules or implement their functionality directly.
Ports are then used to connect modules with each other, creating the structure of the simulation.
There are two ways to implement a process in a module:
% \begin{itemize}
% \item
An \texttt{SC\_METHOD} are sensitive to \texttt{sc\_event}s or other signals.
They can be executed multiple times.
% \item
An \texttt{SC\_THREAD} is started at the beginning of the simulation and should not terminate.
Instead, threads should contain infinite loops and should call explicitly \texttt{wait()} to wait a specific time or on events.
% \end{itemize}
Moreover, there is \texttt{sc\_event\_queue} which makes it possible to queue multiple pending events, where as an \texttt{sc\_event} ignores further notifications until it is waited on.
Those concepts being introduced will become important in section \ref{sec:implementation} where the implementation of several SystemC modules will be discussed.
SystemC supports numerous abstraction levels for modeling systems, namely \textit{cycle-accurate}, which is the most accurate abstraction but also the slowest, \textit{approximateley-timed} and \textit{loosley-timed}. SystemC supports numerous abstraction levels for modeling systems, namely \textit{cycle-accurate}, which is the most accurate abstraction but also the slowest, \textit{approximateley-timed} and \textit{loosley-timed}.
The latter two abstraction levels belog to \revabbr{transaction level modeling}{TLM}, which will be discussed in the next section \ref{sec:tlm}. The latter two abstraction levels belog to \revabbr{transaction level modeling}{TLM}, which will be discussed in the next section \ref{sec:tlm}.
One further abstraction level, \textit{untimed}, will not be topic of this thesis. One further abstraction level, \textit{untimed}, will not be topic of this thesis.

View File

@@ -84,6 +84,7 @@ In a fully associative cache, a memory reference can be placed anywhere, consequ
Although this policy has the highest potential cache hit rate, the high space consumption due to comparators and high power consumption due to the lookup process, makes it non-feasable for many systems. Although this policy has the highest potential cache hit rate, the high space consumption due to comparators and high power consumption due to the lookup process, makes it non-feasable for many systems.
The hybrid approach of set-associative caches offers a trade-off between both policies. The hybrid approach of set-associative caches offers a trade-off between both policies.
The term \textit{associtativity} denotes the number of cache lines that are contained in a set.
\subsection{Replacement Policies} \subsection{Replacement Policies}
\label{sec:replacement_policies} \label{sec:replacement_policies}
@@ -118,6 +119,9 @@ To mitigate the problem, a write buffer can be used, which allows the processor
An alternative is a so called \textit{write-back} cache. An alternative is a so called \textit{write-back} cache.
Instead of writing the updated value immediately to the underlying memory, it will be written back when the corresponding cache line is evicted. Instead of writing the updated value immediately to the underlying memory, it will be written back when the corresponding cache line is evicted.
To identify if a cache line has to be written back, a so-called \textit{dirty-bit} is used:
It denotes if the value has been updated while it has been in the cache.
If it is the case, it has to be written back to ensure consistency, otherwise it is not needed.
Also here, a write buffer can be used to place the actual write back requests into a queue. Also here, a write buffer can be used to place the actual write back requests into a queue.
\subsection{Virtual Addressing} \subsection{Virtual Addressing}
@@ -183,13 +187,13 @@ As this is a major slowdown, non-blocking caches try to solve this problem, maki
Similarly to the write buffer, previously discussed in \ref{sec:write_policies}, a new buffer will be introduced: the \revabbr{miss status hold register}{MSHR}. Similarly to the write buffer, previously discussed in \ref{sec:write_policies}, a new buffer will be introduced: the \revabbr{miss status hold register}{MSHR}.
The number of MSHRs correspond to the number of misses the cache can handle concurrently; when all available MSHRs are occupied and a further miss occurs, the cache will block. The number of MSHRs correspond to the number of misses the cache can handle concurrently; when all available MSHRs are occupied and a further miss occurs, the cache will block.
A MSHR entry always corresponds to one cache line that is currently being fetched from the underlying memory subsystem. An MSHR entry always corresponds to one cache line that is currently being fetched from the underlying memory subsystem.
There are two variants of cache misses: There are two variants of cache misses:
\textit{Primary misses} are misses that lead to another occupation of a MSHR, where as \textit{secondary misses} are added to an existing MSHR entry and therefore cannot cause the cache to block. \textit{Primary misses} are misses that lead to another occupation of an MSHR, where as \textit{secondary misses} are added to an existing MSHR entry and therefore cannot cause the cache to block.
This is the case when the same cache line as accessed. This is the case when the same cache line as accessed.
An architecture of a MSHR file is illustrated in figure \ref{fig:mshr_file}. An architecture of an MSHR file is illustrated in figure \ref{fig:mshr_file}.
\begin{figure}[!ht] \begin{figure}[!ht]
\begin{center} \begin{center}

View File

@@ -4,7 +4,7 @@
In this section, the new components that were developed, which enable the tracing of an arbitrary application in real-time, as well as the replay of those traces in DRAMSys, will be introduced. In this section, the new components that were developed, which enable the tracing of an arbitrary application in real-time, as well as the replay of those traces in DRAMSys, will be introduced.
At first, the DynamoRIO analyzer tool that produces the memory access traces and its place in the DrCacheSim-Framework will be explained. At first, the DynamoRIO analyzer tool that produces the memory access traces and its place in the DrCacheSim-Framework will be explained.
Furthermore, the trace player for DRAMSys will acquire special focus as well as the mandatory cache model that is used to model the cache-filtering in a real system. Furthermore, the new trace player for DRAMSys will acquire special focus as well as the mandatory cache model that is used to model the cache-filtering in a real system.
% Oder auch nicht: ? % Oder auch nicht: ?
The last part will concentrate on the special architecture of the new trace player interface and challenges the internal interconnection solves. The last part will concentrate on the special architecture of the new trace player interface and challenges the internal interconnection solves.
@@ -22,7 +22,7 @@ The physical address conversion only works on Linux and requires root privileges
The analyzer tool can either be running alongside with DrCacheSim (online) or operate on an internal trace format (offline). The analyzer tool can either be running alongside with DrCacheSim (online) or operate on an internal trace format (offline).
As of writing this thesis, the offline tracing mode does not yet support the physical address conversation, so the online mode has to be used. As of writing this thesis, the offline tracing mode does not yet support the physical address conversation, so the online mode has to be used.
In case of the online tracing, DrCacheSim consists of two seperate processes: In case of the online tracing, DrCacheSim consists of two separate processes:
\begin{itemize} \begin{itemize}
\item \item
A client-side process (the DynamoRIO client) which injects observational instructions into the application's code cache. A client-side process (the DynamoRIO client) which injects observational instructions into the application's code cache.
@@ -145,11 +145,41 @@ While this does not take the type of the executed instructions into account, it
\subsection{Non-Blocking Cache} \subsection{Non-Blocking Cache}
\label{sec:cache_implementation} \label{sec:cache_implementation}
This section gives an overview over the cache model that is This section gives an overview over the cache model that is used by the new trace player.
It is implemented as a non-blocking cache that, as explained in section \ref{sec:caches_non_blocking_caches}, can accept new requests even when multiple cache misses are being handled.
It is to note that the current implementation does not use a snooping protocol. The cache inherits from the \texttt{sc\_module} base class and has a target socket, to accept requests from the processor or higher level cache, as well as an initiator socket, to send requests to a lower level cache or to the DRAM subsystem.
Therefore, no cache coherency is guaranteed and memory shared between multiple processor cores will result in incorrect results as the values are not synchronized between the caches. It has a configurable size, associativity, cache line size, MSHR buffer depth, write buffer depth and target depth for one MSHR entry.
However, it is to expect that this will not drastically affect the simulation results.
To understand how the cache model works, a hypothetical request from the CPU will be assumed to explain the internal processing of the transaction in detail:
When the transaction arrives, it will be placed in the PEQ of the cache from where the handler for the \texttt{BEGIN\_REQ} phase is called.
The handler verifies that the cache buffers are not full\footnote{Otherwise the cache will apply back pressure on the CPU and postpone the handling of the transaction.} and checks if the requested data is stored in the cache.
If it is the case (i.e. a cache hit), the cache model sends immediately an \texttt{END\_REQ} and, when the target socket is not currently occupied with an response, accesses the cache\footnote{In case of a read transaction, the content of the cache line is copied into the transaction; in case of a write transaction, the cache line is updated with the new value.} and sends the \texttt{BEGIN\_RESP} phase to the processor.
The processor then finalizes the transaction with \texttt{END\_RESP}, the target back pressure of the cache will be cleared, and the postponed request from the CPU (if it exists) is now placed into the PEQ again.
On the other hand, when the requested data is not in the cache (i.e. a cache miss), first it will be checked if there is already an existing MSHR entry for the corresponding cache line.
If it is the case\footnote{And if the target list of the MSHR entry is not full. When this is the case, the transaction is postponed.}, the transaction is appended to it as an additional target.
If not, a cache line is evicted\footnote{When an eviction is not possible, the transaction is postponed.} to make space for the new cache line that will be fetched from the underlying memory.
When the \texttt{dirty} flag of the old cache line is set, it has to be placed into the write buffer and written back to the memory.
The newly evicted cache line is now \textit{allocated}, but not \textit{valid}.
Then, the transaction is put in an MSHR entry and the \texttt{END\_REQ} phase is sent back to the processor.
To process the entries in the MSHR and in the write buffer, the \texttt{processMshrQueue()} and \texttt{processWriteBuffer()} methods are called at appropriate times.
In the former, a not yet issued MSHR entry is selected for which a new fetch transaction is generated and sent to the underlying memory.
Note that special care has to be taken when the requested cache line is also present in the write buffer:
To ensure consistency, no new request is sent to the DRAM and instead the value is snooped out of the write buffer.
In the latter, the processing of the write back buffer, a not yet issued entry is selected and a new write transaction is sent to the memory.\footnote{Both \texttt{processMshrQueue()} and \texttt{processWriteBuffer()} also need to ensure that currently no back pressure is applied onto the cache from the memory.}
Incoming transactions from the memory side are accepted with a \texttt{END\_RESP} and, in case of a fetch transaction, used to update the cache contents and possibly preparing a new response transaction for the processor as described before.
This example works analogously with an other cache as the requesting module or an other cache as the target module for a fetch or write back accesses.
It is to note that the current implementation does not utilize a snooping protocol.
Therefore, cache coherency is not guaranteed and memory shared between multiple processor cores will result in incorrect results as the values are not synchronized between the caches.
However, it is to expect that this will not drastically affect the simulation results for applications with few shared resources.
The implementation of a snooping protocol is a candidate for future improvements.
%However, it is to expect that this will not drastically affect the simulation results.
\subsection{A New Trace Player Interface} \subsection{A New Trace Player Interface}
\label{sec:traceplayer_interface} \label{sec:traceplayer_interface}

View File

@@ -1,195 +1,3 @@
\section{Appendix} \section{Appendix}
\label{sec:appendix} \label{sec:appendix}
\begin{listing}[H]
\begin{perlcode}
#!/usr/bin/perl
use strict;
use warnings;
use POSIX;
# University of Kaiserslautern 2014
# Matthias Jung
# Christian Weis
# Peter Ehses
# programm call: perl error_detecta.pl input output
my $input = $ARGV[0];
my $id = "022804e800";
my $pattern = hex("FFFFFFFF"); # data pattern can be changed to AAAAAAAA or 55555555
my $errors = 0;
my $i = 1;
my $addr = 0; # DRAM address in hex
my $addrb = 0; # 23 bit in binary
my $addroffset = 536870912; # offset for the address in hex 0x20000000
my $bank =0; # 2 bits for the bank
my $bankshift =21; # shiftoperator
my $bankand = 6291456; # 2^22+2^21
my $row =0; # 12 bits for the row
my $rowshift =9; # shiftoperator
my $rowand = 2096640; # 2^20+...+2^9
my $column =0; # 9 bits for the column
my $columnand = 511; # 2^8+...+2^0
open(IFH, $input);
open(out_file, ">errorout_$ARGV[1]");
printf out_file ("The following table shows the addresses ");
printf out_file ("from the errors in the wideIO SDRAM.\n");
printf out_file ("Addresses in binary \t\t\t Addresses in hexadecimal\n");
printf out_file ("bank \t row \t\t column \t SDRAM address \t data value\n");
while(<IFH>)
{
unless($_ =~ /\[.*\]/ || $_ =~ /$id/)
{
my $value = $_;
chop($value);
$value = substr( $value , 2);
my $result = sprintf("%0b", (hex($value) ^ $pattern));
for(my $j = 0; $j < length($result); $j ++)
{
if(substr( $result, $j , 1 ) eq "1")
{
$errors++;
$addrb = $i-((ceil($i/11))*3);
$addr = $addrb + $addroffset;
$bank = ($addrb & $bankand) >> $bankshift;
$row = ($addrb & $rowand) >> $rowshift;
$column = ($addrb & $columnand);
printf out_file ("%02b\t %012b\t %09b\t %#8x\t $value\n", $bank, $row, $column, $addr);
}
}
}
$i++;
}
close(out_file);
print "Errors = ".$errors."\n";
close(IFH);
\end{perlcode}
\caption{Perl script to find errors for data pattern F, A or 5}
\label{lis:5af}
\end{listing}
\pagebreak
Pagebreak and linebreak has to be done manually with pygmentize, this feature is
not yet implemented. Open the appendix.tex file and see the source code
afterwards how the pagebreak is done. For that the appendix.tex has to be
written with pagebreaks, so that the layout of the pages is done manually.
Linebreaks are easier to do, just check that the lines are in the box of the pdf
file, otherwise make a linebreak yourself.
\pagebreak
%\begin{listing}[H]
\begin{minted}[linenos, bgcolor=light-gray, fontsize=\scriptsize]{perl}
#!/usr/bin/perl
use strict;
use warnings;
use POSIX;
use Chart::Gnuplot;
# University of Kaiserslautern 2014
# Matthias Jung
# Christian Weis
# Peter Ehses
# call programm: perl plotreffff_0xf.pl dfile1 dfile2 dfile3
my $i = 0;
my $line = 3;
my $addr = 0; # DRAM address in hex
my $bankb; # 2 bits for the bank
my $rowb; # 12 bits for the row
my $columnb; # 9 bits for the column
my $bank; # banknumber in decimal
my $row; # rownumber in decimal
my $column; # columnnumber in decimal
my $value;
my @ytics = [0, 25,50,75,100,125,150,175,200,225,250,275,300,325,350,375,400,425,450,475,500];
my @xtics = [0,250,500,750,1000,1250,1500,1750,2000,2250,2500,2750,3000,3250,3500,3750,4000];
# set terminal to svg format
my $terminal = 'svg mouse jsdir '.'"http://gnuplot.sourceforge.net/demo_svg"';
sub bin2dec {return unpack("N", pack("B32", substr("0" x 32 . shift, -33)));}
my @row_array;
my @column_array;
foreach my $argnum (0 .. $#ARGV)
{
open(IFH, $ARGV[$argnum]);
$i = 0;
while(<IFH>)
{
chomp;
$i++;
if ($i > $line)
{
($bankb, $rowb, $columnb, $addr, $value) = split("\t");
$bank = (bin2dec($bankb));
$row = (bin2dec($rowb));
$column = (bin2dec($columnb));
if ($argnum == 0)
{
push(@{$row_array[$bank]}, $row);
push(@{$column_array[$bank]}, $column);
}
if ($argnum == 1)
{
push(@{$row_array[$bank+4]}, $row);
push(@{$column_array[$bank+4]}, $column);
}
if ($argnum == 2)
{
push(@{$row_array[$bank+8]}, $row);
push(@{$column_array[$bank+8]}, $column);
}
}
}
close(IFH);
}
for (my $count = 1; $count < 5; $count++)
{
$bank = $count -1;
\end{minted}
%here is a pagebreak, and the next line of the code is starting with 70, has to be specified with minted like below.
\begin{listing}[H]
\begin{minted}[linenos, bgcolor=light-gray, fontsize=\scriptsize, firstnumber=70]{perl}
my $plot1 = Chart::Gnuplot->new(
terminal => $terminal, output => "plot_ref202ms_0xf_b_$count.svg",
title => "Errors channel 3 of SDRAM, bank $count, data pattern 0xffffffff and refresh 202 ms",
imagesize => '1024, 768', xlabel => "row address", ylabel => "column address", yrange=>[0, 511],
xrange=>[0, 4095], ytics => {labels => @ytics}, xtics => {labels => @xtics},
legend => {position => "outside center bottom", order =>"horizontal reverse",
border => "on", align => "left"}
);
my $dataSet1 = Chart::Gnuplot::DataSet->new(
xdata => \@{$row_array[$bank]}, ydata => \@{$column_array[$bank]},
color => "blue", pointtype => 6, pointsize => 1.75, width => 2,
title => "95 degree C"
);
my $dataSet2 = Chart::Gnuplot::DataSet->new(
xdata => \@{$row_array[$bank+4]}, ydata => \@{$column_array[$bank+4]},
color => "red", pointtype => 8, pointsize => 1.25, width => 2,
title => "100 degree C"
);
my $dataSet3 = Chart::Gnuplot::DataSet->new(
xdata => \@{$row_array[$bank+8]}, ydata => \@{$column_array[$bank+8]},
color => "dark-green", pointtype => 10, pointsize => 1.25, width => 2,
title => "105 degree C"
);
if (@{$row_array[$bank]}){$plot1->plot2d($dataSet1, $dataSet2, $dataSet3);}
if (!@{$row_array[$bank]}){$plot1->plot2d($dataSet2, $dataSet3);}
if (!@{$row_array[$bank]} && !@{$row_array[$bank+4]}){$plot1->plot2d($dataSet3);}
}
\end{minted}
\caption{Perl script for scatter plot of different refresh periods}
\label{lis:plotref}
\end{listing}