64 lines
6.1 KiB
TeX
64 lines
6.1 KiB
TeX
\section{System-Level Modeling}
|
|
\label{sec:vp}
|
|
|
|
To evaluate the impact of \ac{pim} on the performance and power consumption of various applications, it is essential to perform simulations.
|
|
Such simulations allow investigating critical factors such as the \ac{pim} microkernel setup overhead and the actual performance improvement of the \ac{pim} kernel compared to traditional platforms.
|
|
It even may allow for the identification of potential improvements to the \ac{pim} architecture.
|
|
In addition, the suitability of different applications for \ac{pim} can be evaluated, as well as the influence of the specific memory layout requirements on the application software.
|
|
|
|
\subsection{Virtual Prototypes}
|
|
To perform such simulations, it is necessary to use a simulation model, commonly referred to as a \ac{vp}.
|
|
\Acp{vp} act as executable software models of a physical hardware system, allowing the architecture of the system to be completely simulated in software.
|
|
This in turn enables the software development and the identification of potential platform-specific software bugs without the need for the actual hardware implementation \cite{antonino2018}.
|
|
\Acp{vp} provide full visibility and control over the entire simulated system, helping to identify bottlenecks and potential specification errors in the design.
|
|
They also allow the exploration of the design space, for example, in the case of \aca{fimdram}, this includes the variation of the ratio of \ac{pim} units to the number of memory banks and the effect on the performance of the \ac{pim} microkernel.
|
|
|
|
However, using the appropriate level of abstraction in the software model is critical to make well-informed statements about the system without compromising the performance of the software model itself by being at a too low level, such as the \ac{rtl}.
|
|
A viable compromise is the \ac{at} abstraction level within the \ac{tlm} technique, which is widely used in the SystemC \cite{systemc2023} virtual prototyping standard.
|
|
The \ac{at} coding style simplifies the modeling of communication between different system components by modeling it only through function calls that are synchronized at different points in time.
|
|
This approach eliminates the need to simulate complex bus protocols while maintaining the accuracy required for design space exploration and performance evaluation.
|
|
|
|
Two different \ac{vp} simulation frameworks used in the implementation of the \aca{fimdram} software model, namely gem5 and DRAMSys, are introduced in the following sections.
|
|
|
|
\subsection{The gem5 Simulator}
|
|
|
|
The gem5 simulator is an open-source computer architecture simulation platform used for system-level architecture research \cite{lowe-power2020}.
|
|
This powerful platform allows the measurement of various statistics, including runtime, memory bandwidth, and internal processor metrics across different hardware configurations.
|
|
The gem5 simulator runs a user application and simulates it with it's sophisticated processor models with accurate timing.
|
|
It consists of a simulator core and parameterized models for many components, including out-of-order processors, bus systems, and \ac{dram}.
|
|
As a result, gem5 provides a comprehensive framework for simulating and analyzing complex computer systems.
|
|
|
|
Two different modes can be used with gem5: full system simulation and system call emulation.
|
|
In full system mode, gem5 boots a complete operating system kernel and runs the user's application on top of it to generate detailed statistics.
|
|
In system call emulation mode, the simulated application uses the syscall interface of the host operating system, ignoring the timing effects of these operating system routines.
|
|
Consequently, the full system mode provides better accuracy, while the system call emulation mode takes less time to complete the simulation.
|
|
|
|
In addition to the integrated components of the platform, gem5 provides a SystemC \ac{api} to enable the use of external SystemC models.
|
|
An example of such an external model is the \ac{dram} simulator DRAMSys, which is based on \ac{tlm}-\ac{at} to provide cycle-accurate memory models without sacrificing simulation performance.
|
|
|
|
\subsection{DRAMSys}
|
|
|
|
DRAMSys is an open-source framework for design space exploration and provides the ability to simulate the latest \ac{jedec} \ac{dram} standards \cite{steiner2022a}.
|
|
The framework is optimized for high simulation speed and uses the \ac{at} coding style, while ensuring cycle-accurate results.
|
|
\Cref{img:dramsys} provides an overview of the internal architecture of DRAMSys, which consists of a frontend, a backend and the memory models.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.8\linewidth]{images/dramsys}
|
|
\caption[The internal architecture of DRAMSys]{The internal architecture of DRAMSys \cite{jung2017a}.}
|
|
\label{img:dramsys}
|
|
\end{figure}
|
|
|
|
The arbitration unit routes incoming packets to the appropriate channel controller based on the address mapping.
|
|
Each independent channel controller is responsible for controlling a single DRAM channel and issuing the necessary DRAM commands for read and write operations.
|
|
The scheduler, located within a channel controller, has the ability to reorder incoming requests to optimize for specific metrics.
|
|
In conjunction with the response queue, requests can be completed out-of-order, improving overall system performance based on a specific metric.
|
|
|
|
At the frontend of DRAMSys, a variety of initiators can be connected, including traffic generators that generate random accesses, as well as sophisticated processor model such as gem5.
|
|
In cases where such a processor model is used to execute a user application, DRAMSys uses its internal memory model to store and retrieve the requested data, rather than ignoring the contents of the request.
|
|
|
|
DRAMSys provides support for the latest \ac{jedec} \ac{dram} standards, including \aca{hbm}.
|
|
Thus, gem5 and DRAMSys together form a robust platform for implementing and researching the \aca{fimdram} architecture introduced by Samsung, entirely through a software model.
|
|
To achieve this, the \aca{hbm} \ac{dram} model must be extended to include the processing units integrated into each \ac{pch}.
|
|
The following section provides a detailed description of this implementation of \aca{fimdram}, the \ac{pim} virtual machine.
|