Files
master-thesis/src/chapters/vp.tex
2024-02-15 21:09:14 +01:00

64 lines
6.1 KiB
TeX

\section{System-Level Modeling}
\label{sec:vp}
To evaluate the impact of \ac{pim} on the performance and power consumption of various applications, it is essential to perform simulations.
Such simulations allow investigating critical factors such as the \ac{pim} microkernel setup overhead and the actual performance improvement of the \ac{pim} kernel compared to traditional platforms.
It even may allow for the identification of potential improvements to the \ac{pim} architecture.
In addition, the suitability of different applications for \ac{pim} can be evaluated, as well as the influence of the specific memory layout requirements on the application software.
\subsection{Virtual Prototypes}
To perform such simulations, it is necessary to use a simulation model, commonly referred to as a \ac{vp}.
\Acp{vp} act as executable software models of a physical hardware system, allowing the architecture of the system to be completely simulated in software.
This in turn enables the software development and the identification of potential platform-specific software bugs without the need for the actual hardware implementation \cite{antonino2018}.
\Acp{vp} provide full visibility and control over the entire simulated system, helping to identify bottlenecks and potential specification errors in the design.
They also allow the exploration of the design space, for example, in the case of \aca{fimdram}, this includes the variation of the ratio of \ac{pim} units to the number of memory banks and the effect on the performance of the \ac{pim} microkernel.
However, using the appropriate level of abstraction in the software model is critical to make well-informed statements about the system without compromising the performance of the software model itself by being at a too low level, such as the \ac{rtl}.
A viable compromise is the \ac{at} abstraction level within the \ac{tlm} technique, which is widely used in the SystemC \cite{systemc2023} virtual prototyping standard.
The \ac{at} coding style simplifies the modeling of communication between different system components by modeling it only through function calls that are synchronized at different points in time.
This approach eliminates the need to simulate complex bus protocols while maintaining the accuracy required for design space exploration and performance evaluation.
Two different \ac{vp} simulation frameworks used in the implementation of the \aca{fimdram} software model, namely gem5 and DRAMSys, are introduced in the following sections.
\subsection{The gem5 Simulator}
The gem5 simulator is an open-source computer architecture simulation platform used for system-level architecture research \cite{lowe-power2020}.
This powerful platform allows the measurement of various statistics, including runtime, memory bandwidth, and internal processor metrics across different hardware configurations.
The gem5 simulator runs a user application and simulates it with it's sophisticated processor models with accurate timing.
It consists of a simulator core and parameterized models for many components, including out-of-order processors, bus systems, and \ac{dram}.
As a result, gem5 provides a comprehensive framework for simulating and analyzing complex computer systems.
Two different modes can be used with gem5: full system simulation and system call emulation.
In full system mode, gem5 boots a complete operating system kernel and runs the user's application on top of it to generate detailed statistics.
In system call emulation mode, the simulated application uses the syscall interface of the host operating system, ignoring the timing effects of these operating system routines.
Consequently, the full system mode provides better accuracy, while the system call emulation mode takes less time to complete the simulation.
In addition to the integrated components of the platform, gem5 provides a SystemC \ac{api} to enable the use of external SystemC models.
An example of such an external model is the \ac{dram} simulator DRAMSys, which is based on \ac{tlm}-\ac{at} to provide cycle-accurate memory models without sacrificing simulation performance.
\subsection{DRAMSys}
DRAMSys is an open-source framework for design space exploration and provides the ability to simulate the latest \ac{jedec} \ac{dram} standards \cite{steiner2022a}.
The framework is optimized for high simulation speed and uses the \ac{at} coding style, while ensuring cycle-accurate results.
\Cref{img:dramsys} provides an overview of the internal architecture of DRAMSys, which consists of a frontend, a backend and the memory models.
\begin{figure}
\centering
\includegraphics[width=0.8\linewidth]{images/dramsys}
\caption[The internal architecture of DRAMSys]{The internal architecture of DRAMSys \cite{jung2017a}.}
\label{img:dramsys}
\end{figure}
The arbitration unit routes incoming packets to the appropriate channel controller based on the address mapping.
Each independent channel controller is responsible for controlling a single DRAM channel and issuing the necessary DRAM commands for read and write operations.
The scheduler, located within a channel controller, has the ability to reorder incoming requests to optimize for specific metrics.
In conjunction with the response queue, requests can be completed out-of-order, improving overall system performance based on a specific metric.
At the frontend of DRAMSys, a variety of initiators can be connected, including traffic generators that generate random accesses, as well as sophisticated processor model such as gem5.
In cases where such a processor model is used to execute a user application, DRAMSys uses its internal memory model to store and retrieve the requested data, rather than ignoring the contents of the request.
DRAMSys provides support for the latest \ac{jedec} \ac{dram} standards, including \aca{hbm}.
Thus, gem5 and DRAMSys together form a robust platform for implementing and researching the \aca{fimdram} architecture introduced by Samsung, entirely through a software model.
To achieve this, the \aca{hbm} \ac{dram} model must be extended to include the processing units integrated into each \ac{pch}.
The following section provides a detailed description of this implementation of \aca{fimdram}, the \ac{pim} virtual machine.