Some Interconnect additions

This commit is contained in:
2022-07-03 22:38:37 +02:00
parent 02825af7ab
commit 8be5d711f9

View File

@@ -95,7 +95,7 @@ Also, to be able to decode the instructions in the online tracing, a set of patc
\subsection{Trace Player Architecture} \subsection{Trace Player Architecture}
\label{sec:dbiplayer_architecture} \label{sec:dbiplayer_architecture}
This section covers the general architecture of the \texttt{DbiPlayer}, the new trace player for DRAMSys that replays the captured trace files. This section covers the general architecture of the \textit{DbiPlayer}, the new trace player for DRAMSys that replays the captured trace files.
For every recorded thread, a new so-called DbiThreadPlayer is spawned, which is a standalone initiator for transactions. For every recorded thread, a new so-called DbiThreadPlayer is spawned, which is a standalone initiator for transactions.
Because those threads need to be synchronized to approximate the real behavior, they need to communicate among each other. Because those threads need to be synchronized to approximate the real behavior, they need to communicate among each other.
@@ -104,12 +104,12 @@ This communication, however, brings up the necessity to containerize the thread
With the old DRAMSys interface for trace players this was not easily realizable, so a new generic initiator interface was developed that makes it possible to connect components to DRAMSys whose internal architecture can be arbitrary. With the old DRAMSys interface for trace players this was not easily realizable, so a new generic initiator interface was developed that makes it possible to connect components to DRAMSys whose internal architecture can be arbitrary.
This new interface will be further discussed in section \ref{sec:traceplayer_interface}. This new interface will be further discussed in section \ref{sec:traceplayer_interface}.
For the \texttt{DbiPlayer}, an additional interconnect module will bundle up all \\ \texttt{simple\_initiator\_sockets} to a single \texttt{multi\_passthrough\_initiator\_socket} as presented in figure \ref{fig:dbiplayer_without_caches}. For the \textit{DbiPlayer}, an additional interconnect module will bundle up all \\ \texttt{simple\_initiator\_sockets} to a single \texttt{multi\_passthrough\_initiator\_socket} as presented in figure \ref{fig:dbiplayer_without_caches}.
\begin{figure} \begin{figure}
\begin{center} \begin{center}
\tikzfig{img/without_caching} \tikzfig{img/without_caching}
\caption{Architecture of the \texttt{DbiPlayer} without caches.} \caption{Architecture of the \textit{DbiPlayer} without caches.}
\label{fig:dbiplayer_without_caches} \label{fig:dbiplayer_without_caches}
\end{center} \end{center}
\end{figure} \end{figure}
@@ -117,13 +117,13 @@ For the \texttt{DbiPlayer}, an additional interconnect module will bundle up all
As the memory accesses are directly extracted from the executed instructions, simply sending a transaction to the DRAM subsystem for every data reference would neglect the caches of today's processors completely. As the memory accesses are directly extracted from the executed instructions, simply sending a transaction to the DRAM subsystem for every data reference would neglect the caches of today's processors completely.
Therefore, also a cache model is required whose implementation will be explained in more detail in section \ref{sec:cache_implementation}. Therefore, also a cache model is required whose implementation will be explained in more detail in section \ref{sec:cache_implementation}.
Many modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache. Many modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache.
This hierarchy is also reflected in the \texttt{DbiPlayer} as shown in Figure \ref{fig:dbiplayer_with_caches}. This hierarchy is also reflected in the \textit{DbiPlayer} as shown in Figure \ref{fig:dbiplayer_with_caches}.
\begin{landscape} \begin{landscape}
\begin{figure} \begin{figure}
\begin{center} \begin{center}
\tikzfig{img/with_caching} \tikzfig{img/with_caching}
\caption{Architecture of the \texttt{DbiPlayer} with caches.} \caption{Architecture of the \textit{DbiPlayer} with caches.}
\label{fig:dbiplayer_with_caches} \label{fig:dbiplayer_with_caches}
\end{center} \end{center}
\end{figure} \end{figure}
@@ -132,9 +132,9 @@ This hierarchy is also reflected in the \texttt{DbiPlayer} as shown in Figure \r
\subsection{Trace Player Functionality} \subsection{Trace Player Functionality}
\label{sec:dbiplayer_functionality} \label{sec:dbiplayer_functionality}
With the overall architecture of the initiator introduced, this section explains the internal functionality of the \texttt{DbiPlayer} and its threads. With the overall architecture of the initiator introduced, this section explains the internal functionality of the \textit{DbiPlayer} and its threads.
The threads of the \texttt{DbiPlayer} are specialized initiator modules that inherit from the more generic \texttt{TrafficInitiatorThread} class. The threads of the \textit{DbiPlayer} are specialized initiator modules that inherit from the more generic \texttt{TrafficInitiatorThread} class.
Each \texttt{TrafficInitiatorThread} consists of an \texttt{sendNextPayloadThread()} \texttt{SC\_THREAD} that inturn calls the virtual method \texttt{sendNextPayload()}, that is implemented in the \texttt{DbiThreadPlayer}, each time the \texttt{sc\_event\_queue} \texttt{sendNextPayloadEvent} is being notified. Each \texttt{TrafficInitiatorThread} consists of an \texttt{sendNextPayloadThread()} \texttt{SC\_THREAD} that inturn calls the virtual method \texttt{sendNextPayload()}, that is implemented in the \texttt{DbiThreadPlayer}, each time the \texttt{sc\_event\_queue} \texttt{sendNextPayloadEvent} is being notified.
Each \texttt{DbiThreadPlayer} iterates through its trace file and stores the entries in an internal buffer. Each \texttt{DbiThreadPlayer} iterates through its trace file and stores the entries in an internal buffer.
@@ -147,7 +147,7 @@ While this does not take the type of the executed instructions into account, it
As mentioned previously, the threads cannot run by themselves, rather they require synchronization to ensure the simulated system replicates the real running application as good as possible. As mentioned previously, the threads cannot run by themselves, rather they require synchronization to ensure the simulated system replicates the real running application as good as possible.
The analysis tool appends timestamps into the memory access traces that will be used to pause the execution of a thread, when the global time has not yet reached this far, or to advance the global time, when the thread is allowed to run. The analysis tool appends timestamps into the memory access traces that will be used to pause the execution of a thread, when the global time has not yet reached this far, or to advance the global time, when the thread is allowed to run.
It is to note that the term global time in this context does not correspond to the SystemC simulation time but denotes a loose time variable that the \texttt{DbiPlayer} uses to schedule its threads. It is to note that the term global time in this context does not correspond to the SystemC simulation time but denotes a loose time variable that the \textit{DbiPlayer} uses to schedule its threads.
A set of rules determine if a thread is allowed to make progress beyond a timestamp that is further than the current global time: A set of rules determine if a thread is allowed to make progress beyond a timestamp that is further than the current global time:
\begin{enumerate} \begin{enumerate}
@@ -244,4 +244,18 @@ This information, however, is not propagated to the other caches, leading to an
To solve this problem, the MultiSimpleCoupler only forwards requests to the L3 cache when it is able to accept them. To solve this problem, the MultiSimpleCoupler only forwards requests to the L3 cache when it is able to accept them.
If this is not the case, the request gets internally buffered and forwarded when an earlier request is being completed with the \texttt{END\_REQ} phase. If this is not the case, the request gets internally buffered and forwarded when an earlier request is being completed with the \texttt{END\_REQ} phase.
% Anhand von Beispiel erklären! :) % Beispiel
For illustrating this further, a simple example can be assumed:
One L2 cache needs to request a cache line from the underlying L3 cache.
The MultiSimpleCoupler receives the \texttt{BEGIN\_REQ} phase and places it into its PEQ.
From there, an internal routing table is updated to be able to send the response back through the correct multi-socket binding afterwards.
As the L3 cache is currently not applying back pressure onto the interconnect, it can forward the transaction with the \texttt{BEGIN\_REQ} phase to the L3 cache.
Until the L3 cache responds with the \texttt{END\_REQ} phase, the interconnect defers any new request from any L2 cache and buffers the payload objects in an internal data structure.
When the \texttt{END\_REQ} phase is received, the next transaction from this request buffer is sent to the L3 cache.
After some time the, L3 cache will respond with the requested cache lines.
During this \texttt{BEGIN\_RESP} phase, the L2 cache that requested this line is looked up using the routing table and the payload is sent back to it.
Until the L2 cache responds with an \texttt{END\_RESP}, the exclusion rule has to be honored also here:
When a new response from the L3 cache is received, it has to be buffered into another internal data structure until the corresponding target socket binding is clear again.
Once the L2 cache sends out the \texttt{END\_RESP} phase, the interconnect will forward the \texttt{END\_RESP} to the L3 cache, and initiate new response transactions in case the response buffer is not empty.
In conclusion, this special interconnect module with an multi-target socket and a simple-initiator socket ensures that the exclusion rule is respected in both directions.