Start of kernel
This commit is contained in:
@@ -1,10 +1,48 @@
|
||||
\subsection{Application Kernel}
|
||||
\label{sec:kernel}
|
||||
|
||||
With both the \aca{fimdram} model in DRAMSys and the software support library, it is now possible to write an application that runs on gem5 and leverages \ac{pim} to accelerate workloads.
|
||||
When it comes to gem5, there are three different approaches to model a system:
|
||||
\begin{itemize}
|
||||
\item
|
||||
Run the user-space application in \textbf{system call emulation} mode.
|
||||
In this mode, the application is simulated in isolation, while forwarding system calls to the host operating system.
|
||||
This mode has the lowest level of accuracy because many components of the memory system are implemented using a very simplified model, such as page table walking and the \ac{tlb}.
|
||||
\item
|
||||
Simulate the entire system in \textbf{full system} mode, booting a full Linux kernel and running the application to be benchmarked as a user space program.
|
||||
This mode is the most accurate, as it closely resembles a real deployment of an application.
|
||||
It also provides a complete enough environment to develop device drivers, without the need for a real system.
|
||||
\item
|
||||
Finally, run gem5 in full system mode, but boot a custom kernel in a \textbf{bare-metal} environment.
|
||||
This approach is the most flexible, as the user has full control over the hardware configuration as well as the operating system.
|
||||
The user application does not have to run in user space, but can run in a privileged mode, making it easy to implement low-level routines without having to write a device driver with its user space interface.
|
||||
\end{itemize}
|
||||
|
||||
While the system call emulation mode is the simplest option, it has been discarded due to its lack of accuracy and inability to execute privileged instructions.
|
||||
The full system mode, which boots a Linux kernel, on the one hand provides the necessary capability to implement the application, but due to the complexity overhead and the need to write a Linux device driver to execute privileged instructions and control the non-cacheable memory regions, it was decided to favor of the bare-metal option.
|
||||
Here, the self-written kernel has full control over the complete system which is an advantage when implementing a minimal example utilizing \aca{fimdram}.
|
||||
On the other hand, some setup is required, such as initializing the page tables so that the \ac{mmu} of the processor can be enabled and programmed to mark memory regions as cacheable and non-cacheable.
|
||||
|
||||
% python config
|
||||
% bare metal vs linux
|
||||
|
||||
\subsubsection{Boot Code}
|
||||
% linker script
|
||||
% start assembly script
|
||||
|
||||
\subsubsection{Cache Management}
|
||||
% ARM page tables
|
||||
% cache management
|
||||
|
||||
\subsubsection{Bare-Metal Utilities}
|
||||
% Heap Allocator (linked list allocator?...)
|
||||
% uart
|
||||
|
||||
\subsubsection{Memory Configuration}
|
||||
% address mapping
|
||||
% konkrete zahlen zu mcconfig
|
||||
|
||||
\subsubsection{GEMV Microkernel}
|
||||
% heap allocation
|
||||
|
||||
\subsubsection{Benchmark Environment}
|
||||
% m5ops
|
||||
|
||||
@@ -11,13 +11,13 @@ Such a \ac{pim} library must include the following essential features to fully i
|
||||
\item It should provide data structures to assemble \textbf{microkernels} and functions to transfer the microkernels to the \acp{crf} of the processing units.
|
||||
\item To meet the \textbf{memory layout} requirements of the inputs and outputs of an algorithm, it should provide data structures to represent vectors and matrices according to the special layout constraints.
|
||||
\item After switching the mode to \ac{abp}, the library should provide functionality to \textbf{execute a user-defined microkernel} by issuing the necessary memory requests through the execution of \ac{ld} and \ac{st} instructions.
|
||||
\item For platforms, where it is not possible to mark the \ac{pim} memory region as uncacheable, the library should provide the necessary \textbf{cache management} operations to bypass the cache filtering and to generate the right amount of \ac{rd} and \ac{wr} \ac{dram} commands.
|
||||
\item For platforms, where it is not possible to mark the \ac{pim} memory region as non-cacheable, the library should provide the necessary \textbf{cache management} operations to bypass the cache filtering and to generate the right amount of \ac{rd} and \ac{wr} \ac{dram} commands.
|
||||
\end{itemize}
|
||||
|
||||
As already discussed in \cref{sec:vm}, for simplicity and debugability reasons, the host processor communicates with the \ac{pim} model in the \ac{dram} using a \ac{json}-based protocol.
|
||||
To achieve this, a small shared library, that defines the communication data structures as well as routines to serialize and deserialize them, is linked by both the \ac{pim} support library as well as the \ac{pim} model in DRAMSys.
|
||||
A predefined memory region is then used to differentiate these communication messages from regular the regular memory traffic.
|
||||
Ideally, this memory region is also set as uncacheable, so that the messages do not get stuck in the on-chip cache.
|
||||
Ideally, this memory region is also set as non-cacheable, so that the messages do not get stuck in the on-chip cache.
|
||||
Alternatively, the software library must ensure that the cache is flushed after the \ac{json} message is written to the memory region.
|
||||
|
||||
With the mode setting implemented, the shared library also provides type definitions to represent the \ac{pim} instructions in memory and to transfer entire microkernels consisting of 32 instructions to the processing units.
|
||||
|
||||
Reference in New Issue
Block a user