Memory Configuration

This commit is contained in:
2024-02-16 16:36:06 +01:00
parent c04c3fa829
commit 21c2489766
3 changed files with 67 additions and 2 deletions

View File

@@ -328,3 +328,7 @@
short = TCP, short = TCP,
long = Transmission Control Protocol, long = Transmission Control Protocol,
} }
\DeclareAcronym{llff}{
short = LLFF,
long = Linked List First Fit,
}

View File

@@ -4,6 +4,7 @@
% what to do better: % what to do better:
% implement samsungs real mode switching and programming of crfs % implement samsungs real mode switching and programming of crfs
% implement linux kernel driver % implement linux kernel driver
% -> alignment requirements -> huge tables
% make use of sasmsung pim in a real dnn application and measure the effects % make use of sasmsung pim in a real dnn application and measure the effects
% compare with SIMD insts in ARM % compare with SIMD insts in ARM
% compare with real TPUs and GPU platforms % compare with real TPUs and GPU platforms

View File

@@ -92,8 +92,68 @@ In order to incorporate this memory allocator, it has been initialized by provid
The allocator can then dynamically use sections of this arena to allocate the \ac{pim} data structures. The allocator can then dynamically use sections of this arena to allocate the \ac{pim} data structures.
\subsubsection{Memory Configuration} \subsubsection{Memory Configuration}
% address mapping
% konkrete zahlen zu mcconfig As already discussed in \cref{sec:memory_layout} and in \cref{sec:microkernel_execution}, certain requirements are posed onto the configuration of the memory system, such as the \ac{am}.
These configurations can be set when instantiating DRAMSys while it is being connected to the gem5 memory bus.
In \aca{hbm}, the burst size of a memory access is exactly $\qty{32}{\byte}$, which therefore defines the lowest five bits of any valid memory address:
Resulting from $log_2(32)=5$, the first five bits of an address must be zero, since this is the smallest granularity with which the \ac{dram} can be accessed.
The next highest bits should already switch between the different memory banks, as these are coupled with the different processing units.
Following from the 16-wide \ac{fp16} vectors, one of which is $\qty{32}{\byte}$ in size, and the column-major matrix format, subsequent vectors in the linear address space should be spread across all banks so that the processing units can concurrently perform the \ac{mac} operation.
As a result, the \ac{am} is structured in such a way that the lowest bits of an address are mapped to a portion of the column bits, followed by all the various bank bits.
These are then followed by the remaining column bits and, finally, the row bits.
The simplified \ac{am} following this scheme is shown in \cref{img:hbm2_am}.
\begin{figure}
\centering
\begin{bytefield}[bitwidth=4mm,bitheight=5mm]{31}
\bitheader[endianness=big]{0,2,3,4,5,9,10,14,15,30} \\
\bitbox{16}{Row}
\bitbox{5}{Column}
\bitbox{5}{Bank}
\bitbox{2}{C}
\bitbox{3}[bgcolor=verylightgray]{}
\end{bytefield}
\caption[Simplified \aca{hbm} address mapping with a split column mapping]{Simplified \aca{hbm} address mapping with a split column mapping.}
\label{img:hbm2_am}
\end{figure}
In addition to the \ac{am}, the \aca{hbm} system can be configured in terms of stack count, stack height, bank grouping, and memory array dimensions.
The concrete values for these parameters are listed in \cref{tab:memspec}.
\begin{table}
\centering
% \resizebox{\linewidth}{!}{%
\begin{tblr}{
hlines,
vlines,
cell{2}{3} = {r},
cell{3}{3} = {r},
cell{4}{3} = {r},
cell{5}{3} = {r},
cell{6}{3} = {r},
cell{7}{3} = {r},
cell{8}{3} = {r},
hline{2} = {-}{solid,black},
hline{2} = {2}{-}{solid,black},
}
Parameter & Description & Value \\
Number of Bank Groups & Bank Groups per \ac{pch} & 4 \\
Number of Banks & Banks per \ac{pch} & 16 \\
Number of \acp{pch} & \acp{pch} per Channel & 2 \\
Number of Channels & Total Number of Channels & 1 \\
Number of Columns & Columns per Memory Array & 128 \\
Number of Rows & Rows per Memory Array & 65536 \\
Width & Width of the Data Bus & 64
\end{tblr}
% }
\caption[A list of the used configuration parameters of \aca{hbm}]{A list of the used configuration parameters of \aca{hbm}.}
\label{tab:memspec}
\end{table}
As only one channel is simulated, the simulation does not take into account other memory stacks or memory dies in a stack.
Since different channels would only be used to increase the dimensions of the matrices further than it is done in this thesis, and the channels are completely independent of each other, this does not change the timing behavior of the simulation.
\subsubsection{GEMV Microkernel} \subsubsection{GEMV Microkernel}
% heap allocation % heap allocation