Start of kernel

This commit is contained in:
2024-02-15 21:09:14 +01:00
parent 1e993eeb28
commit df8ef883b3
7 changed files with 54 additions and 7 deletions

View File

@@ -1,10 +1,48 @@
\subsection{Application Kernel}
\label{sec:kernel}
With both the \aca{fimdram} model in DRAMSys and the software support library, it is now possible to write an application that runs on gem5 and leverages \ac{pim} to accelerate workloads.
When it comes to gem5, there are three different approaches to model a system:
\begin{itemize}
\item
Run the user-space application in \textbf{system call emulation} mode.
In this mode, the application is simulated in isolation, while forwarding system calls to the host operating system.
This mode has the lowest level of accuracy because many components of the memory system are implemented using a very simplified model, such as page table walking and the \ac{tlb}.
\item
Simulate the entire system in \textbf{full system} mode, booting a full Linux kernel and running the application to be benchmarked as a user space program.
This mode is the most accurate, as it closely resembles a real deployment of an application.
It also provides a complete enough environment to develop device drivers, without the need for a real system.
\item
Finally, run gem5 in full system mode, but boot a custom kernel in a \textbf{bare-metal} environment.
This approach is the most flexible, as the user has full control over the hardware configuration as well as the operating system.
The user application does not have to run in user space, but can run in a privileged mode, making it easy to implement low-level routines without having to write a device driver with its user space interface.
\end{itemize}
While the system call emulation mode is the simplest option, it has been discarded due to its lack of accuracy and inability to execute privileged instructions.
The full system mode, which boots a Linux kernel, on the one hand provides the necessary capability to implement the application, but due to the complexity overhead and the need to write a Linux device driver to execute privileged instructions and control the non-cacheable memory regions, it was decided to favor of the bare-metal option.
Here, the self-written kernel has full control over the complete system which is an advantage when implementing a minimal example utilizing \aca{fimdram}.
On the other hand, some setup is required, such as initializing the page tables so that the \ac{mmu} of the processor can be enabled and programmed to mark memory regions as cacheable and non-cacheable.
% python config
% bare metal vs linux
\subsubsection{Boot Code}
% linker script
% start assembly script
\subsubsection{Cache Management}
% ARM page tables
% cache management
\subsubsection{Bare-Metal Utilities}
% Heap Allocator (linked list allocator?...)
% uart
\subsubsection{Memory Configuration}
% address mapping
% konkrete zahlen zu mcconfig
\subsubsection{GEMV Microkernel}
% heap allocation
\subsubsection{Benchmark Environment}
% m5ops

View File

@@ -11,13 +11,13 @@ Such a \ac{pim} library must include the following essential features to fully i
\item It should provide data structures to assemble \textbf{microkernels} and functions to transfer the microkernels to the \acp{crf} of the processing units.
\item To meet the \textbf{memory layout} requirements of the inputs and outputs of an algorithm, it should provide data structures to represent vectors and matrices according to the special layout constraints.
\item After switching the mode to \ac{abp}, the library should provide functionality to \textbf{execute a user-defined microkernel} by issuing the necessary memory requests through the execution of \ac{ld} and \ac{st} instructions.
\item For platforms, where it is not possible to mark the \ac{pim} memory region as uncacheable, the library should provide the necessary \textbf{cache management} operations to bypass the cache filtering and to generate the right amount of \ac{rd} and \ac{wr} \ac{dram} commands.
\item For platforms, where it is not possible to mark the \ac{pim} memory region as non-cacheable, the library should provide the necessary \textbf{cache management} operations to bypass the cache filtering and to generate the right amount of \ac{rd} and \ac{wr} \ac{dram} commands.
\end{itemize}
As already discussed in \cref{sec:vm}, for simplicity and debugability reasons, the host processor communicates with the \ac{pim} model in the \ac{dram} using a \ac{json}-based protocol.
To achieve this, a small shared library, that defines the communication data structures as well as routines to serialize and deserialize them, is linked by both the \ac{pim} support library as well as the \ac{pim} model in DRAMSys.
A predefined memory region is then used to differentiate these communication messages from regular the regular memory traffic.
Ideally, this memory region is also set as uncacheable, so that the messages do not get stuck in the on-chip cache.
Ideally, this memory region is also set as non-cacheable, so that the messages do not get stuck in the on-chip cache.
Alternatively, the software library must ensure that the cache is flushed after the \ac{json} message is written to the memory region.
With the mode setting implemented, the shared library also provides type definitions to represent the \ac{pim} instructions in memory and to transfer entire microkernels consisting of 32 instructions to the processing units.