Some Interconnect additions
This commit is contained in:
@@ -95,7 +95,7 @@ Also, to be able to decode the instructions in the online tracing, a set of patc
|
|||||||
\subsection{Trace Player Architecture}
|
\subsection{Trace Player Architecture}
|
||||||
\label{sec:dbiplayer_architecture}
|
\label{sec:dbiplayer_architecture}
|
||||||
|
|
||||||
This section covers the general architecture of the \texttt{DbiPlayer}, the new trace player for DRAMSys that replays the captured trace files.
|
This section covers the general architecture of the \textit{DbiPlayer}, the new trace player for DRAMSys that replays the captured trace files.
|
||||||
|
|
||||||
For every recorded thread, a new so-called DbiThreadPlayer is spawned, which is a standalone initiator for transactions.
|
For every recorded thread, a new so-called DbiThreadPlayer is spawned, which is a standalone initiator for transactions.
|
||||||
Because those threads need to be synchronized to approximate the real behavior, they need to communicate among each other.
|
Because those threads need to be synchronized to approximate the real behavior, they need to communicate among each other.
|
||||||
@@ -104,12 +104,12 @@ This communication, however, brings up the necessity to containerize the thread
|
|||||||
With the old DRAMSys interface for trace players this was not easily realizable, so a new generic initiator interface was developed that makes it possible to connect components to DRAMSys whose internal architecture can be arbitrary.
|
With the old DRAMSys interface for trace players this was not easily realizable, so a new generic initiator interface was developed that makes it possible to connect components to DRAMSys whose internal architecture can be arbitrary.
|
||||||
This new interface will be further discussed in section \ref{sec:traceplayer_interface}.
|
This new interface will be further discussed in section \ref{sec:traceplayer_interface}.
|
||||||
|
|
||||||
For the \texttt{DbiPlayer}, an additional interconnect module will bundle up all \\ \texttt{simple\_initiator\_sockets} to a single \texttt{multi\_passthrough\_initiator\_socket} as presented in figure \ref{fig:dbiplayer_without_caches}.
|
For the \textit{DbiPlayer}, an additional interconnect module will bundle up all \\ \texttt{simple\_initiator\_sockets} to a single \texttt{multi\_passthrough\_initiator\_socket} as presented in figure \ref{fig:dbiplayer_without_caches}.
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\tikzfig{img/without_caching}
|
\tikzfig{img/without_caching}
|
||||||
\caption{Architecture of the \texttt{DbiPlayer} without caches.}
|
\caption{Architecture of the \textit{DbiPlayer} without caches.}
|
||||||
\label{fig:dbiplayer_without_caches}
|
\label{fig:dbiplayer_without_caches}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
@@ -117,13 +117,13 @@ For the \texttt{DbiPlayer}, an additional interconnect module will bundle up all
|
|||||||
As the memory accesses are directly extracted from the executed instructions, simply sending a transaction to the DRAM subsystem for every data reference would neglect the caches of today's processors completely.
|
As the memory accesses are directly extracted from the executed instructions, simply sending a transaction to the DRAM subsystem for every data reference would neglect the caches of today's processors completely.
|
||||||
Therefore, also a cache model is required whose implementation will be explained in more detail in section \ref{sec:cache_implementation}.
|
Therefore, also a cache model is required whose implementation will be explained in more detail in section \ref{sec:cache_implementation}.
|
||||||
Many modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache.
|
Many modern cache hierarchies compose of 3 cache levels: 2 caches for every processor core, the L1 and L2 cache, and one cache that is shared across all cores, the L3 cache.
|
||||||
This hierarchy is also reflected in the \texttt{DbiPlayer} as shown in Figure \ref{fig:dbiplayer_with_caches}.
|
This hierarchy is also reflected in the \textit{DbiPlayer} as shown in Figure \ref{fig:dbiplayer_with_caches}.
|
||||||
|
|
||||||
\begin{landscape}
|
\begin{landscape}
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\tikzfig{img/with_caching}
|
\tikzfig{img/with_caching}
|
||||||
\caption{Architecture of the \texttt{DbiPlayer} with caches.}
|
\caption{Architecture of the \textit{DbiPlayer} with caches.}
|
||||||
\label{fig:dbiplayer_with_caches}
|
\label{fig:dbiplayer_with_caches}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
@@ -132,9 +132,9 @@ This hierarchy is also reflected in the \texttt{DbiPlayer} as shown in Figure \r
|
|||||||
\subsection{Trace Player Functionality}
|
\subsection{Trace Player Functionality}
|
||||||
\label{sec:dbiplayer_functionality}
|
\label{sec:dbiplayer_functionality}
|
||||||
|
|
||||||
With the overall architecture of the initiator introduced, this section explains the internal functionality of the \texttt{DbiPlayer} and its threads.
|
With the overall architecture of the initiator introduced, this section explains the internal functionality of the \textit{DbiPlayer} and its threads.
|
||||||
|
|
||||||
The threads of the \texttt{DbiPlayer} are specialized initiator modules that inherit from the more generic \texttt{TrafficInitiatorThread} class.
|
The threads of the \textit{DbiPlayer} are specialized initiator modules that inherit from the more generic \texttt{TrafficInitiatorThread} class.
|
||||||
Each \texttt{TrafficInitiatorThread} consists of an \texttt{sendNextPayloadThread()} \texttt{SC\_THREAD} that inturn calls the virtual method \texttt{sendNextPayload()}, that is implemented in the \texttt{DbiThreadPlayer}, each time the \texttt{sc\_event\_queue} \texttt{sendNextPayloadEvent} is being notified.
|
Each \texttt{TrafficInitiatorThread} consists of an \texttt{sendNextPayloadThread()} \texttt{SC\_THREAD} that inturn calls the virtual method \texttt{sendNextPayload()}, that is implemented in the \texttt{DbiThreadPlayer}, each time the \texttt{sc\_event\_queue} \texttt{sendNextPayloadEvent} is being notified.
|
||||||
|
|
||||||
Each \texttt{DbiThreadPlayer} iterates through its trace file and stores the entries in an internal buffer.
|
Each \texttt{DbiThreadPlayer} iterates through its trace file and stores the entries in an internal buffer.
|
||||||
@@ -147,7 +147,7 @@ While this does not take the type of the executed instructions into account, it
|
|||||||
|
|
||||||
As mentioned previously, the threads cannot run by themselves, rather they require synchronization to ensure the simulated system replicates the real running application as good as possible.
|
As mentioned previously, the threads cannot run by themselves, rather they require synchronization to ensure the simulated system replicates the real running application as good as possible.
|
||||||
The analysis tool appends timestamps into the memory access traces that will be used to pause the execution of a thread, when the global time has not yet reached this far, or to advance the global time, when the thread is allowed to run.
|
The analysis tool appends timestamps into the memory access traces that will be used to pause the execution of a thread, when the global time has not yet reached this far, or to advance the global time, when the thread is allowed to run.
|
||||||
It is to note that the term global time in this context does not correspond to the SystemC simulation time but denotes a loose time variable that the \texttt{DbiPlayer} uses to schedule its threads.
|
It is to note that the term global time in this context does not correspond to the SystemC simulation time but denotes a loose time variable that the \textit{DbiPlayer} uses to schedule its threads.
|
||||||
|
|
||||||
A set of rules determine if a thread is allowed to make progress beyond a timestamp that is further than the current global time:
|
A set of rules determine if a thread is allowed to make progress beyond a timestamp that is further than the current global time:
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
@@ -244,4 +244,18 @@ This information, however, is not propagated to the other caches, leading to an
|
|||||||
To solve this problem, the MultiSimpleCoupler only forwards requests to the L3 cache when it is able to accept them.
|
To solve this problem, the MultiSimpleCoupler only forwards requests to the L3 cache when it is able to accept them.
|
||||||
If this is not the case, the request gets internally buffered and forwarded when an earlier request is being completed with the \texttt{END\_REQ} phase.
|
If this is not the case, the request gets internally buffered and forwarded when an earlier request is being completed with the \texttt{END\_REQ} phase.
|
||||||
|
|
||||||
% Anhand von Beispiel erklären! :)
|
% Beispiel
|
||||||
|
For illustrating this further, a simple example can be assumed:
|
||||||
|
One L2 cache needs to request a cache line from the underlying L3 cache.
|
||||||
|
The MultiSimpleCoupler receives the \texttt{BEGIN\_REQ} phase and places it into its PEQ.
|
||||||
|
From there, an internal routing table is updated to be able to send the response back through the correct multi-socket binding afterwards.
|
||||||
|
As the L3 cache is currently not applying back pressure onto the interconnect, it can forward the transaction with the \texttt{BEGIN\_REQ} phase to the L3 cache.
|
||||||
|
Until the L3 cache responds with the \texttt{END\_REQ} phase, the interconnect defers any new request from any L2 cache and buffers the payload objects in an internal data structure.
|
||||||
|
When the \texttt{END\_REQ} phase is received, the next transaction from this request buffer is sent to the L3 cache.
|
||||||
|
After some time the, L3 cache will respond with the requested cache lines.
|
||||||
|
During this \texttt{BEGIN\_RESP} phase, the L2 cache that requested this line is looked up using the routing table and the payload is sent back to it.
|
||||||
|
Until the L2 cache responds with an \texttt{END\_RESP}, the exclusion rule has to be honored also here:
|
||||||
|
When a new response from the L3 cache is received, it has to be buffered into another internal data structure until the corresponding target socket binding is clear again.
|
||||||
|
Once the L2 cache sends out the \texttt{END\_RESP} phase, the interconnect will forward the \texttt{END\_RESP} to the L3 cache, and initiate new response transactions in case the response buffer is not empty.
|
||||||
|
|
||||||
|
In conclusion, this special interconnect module with an multi-target socket and a simple-initiator socket ensures that the exclusion rule is respected in both directions.
|
||||||
|
|||||||
Reference in New Issue
Block a user