diff --git a/src/acronyms.tex b/src/acronyms.tex index ba8ed1f..6fcdb79 100644 --- a/src/acronyms.tex +++ b/src/acronyms.tex @@ -37,6 +37,7 @@ \DeclareAcronym{dram}{ short = DRAM, long = dynamic random-access memory, + long-plural-form = dynamic random-access memories } \DeclareAcronym{ram}{ short = RAM, @@ -185,7 +186,7 @@ } \DeclareAcronym{dpu}{ short = DPU, - long = DRAM Processing Units, + long = DRAM Processing Unit, } \DeclareAcronym{risc}{ short = RISC, diff --git a/src/chapters/dram.tex b/src/chapters/dram.tex index 9a9155f..3947ed6 100644 --- a/src/chapters/dram.tex +++ b/src/chapters/dram.tex @@ -49,7 +49,7 @@ A \ac{dimm} may also consist of several independent \textit{ranks}, which are co Besides the data bus, the channel consists also of the \textit{command bus} and the \textit{address bus}. Over the command bus, the commands necessary to control memory are issued by the \textit{memory controller}, that sits in between the \ac{dram} and the \ac{mpsoc}. -For example, to read data, the memory controller may first issue a \ac{pre} command to precharge the bitlines in a certain bank, followed by an \iac{act} command to load the contents of a row into the \acp{psa}, and finally a \ac{rd} command to move the data from the \acp{psa} to the \acp{ssa} where it can further be exposed onto the data bus. +For example, to read data, the memory controller may first issue a \ac{pre} command to precharge the bitlines in a certain bank, followed by an \ac{act} command to load the contents of a row into the \acp{psa}, and finally a \ac{rd} command to move the data from the \acp{psa} to the \acp{ssa} where it can further be exposed onto the data bus. The value on the address bus determines the row, column, bank and rank used during the respective commands, while it is the responsibility of the memory controller to translate the \ac{mpsoc}-side address to the respective components in a process called \ac{am}. The \ac{am} ensures that the number of \textit{row misses}, i.e., the need for precharging and activating another row, is minimized. % One particularly common \ac{am} scheme is called \textit{Bank Interleaving} \cite{jung2017a}, which maps the lower address bits to the columns, followed by the ranks and banks, and the highest bits to the rows. @@ -102,7 +102,7 @@ Several \ac{dram} dies are stacked on top of each other and connected with \acp{ \begin{figure} \centering \includegraphics[width=0.8\linewidth]{images/sip} - \caption[Cross-section view of an \ac{hbm} \ac{sip}.]{Cross-section view of a \ac{hbm} \ac{sip} \cite{lee2021}.} + \caption[Cross-section view of an \ac{hbm} \ac{sip}.]{Cross-section view of an \ac{hbm} \ac{sip} \cite{lee2021}.} \label{img:sip} \end{figure} Such a cube is then placed onto a common silicon interposer that connects the \ac{dram} to its host processor. @@ -112,7 +112,7 @@ For example, compared to a conventional \ac{ddr4} \ac{dram}, this tight integrat A memory stack supports up to eight independent memory channels, each containing up to 16 banks divided into four bank groups. The command, address and data bus operate at \ac{ddr}, i.e., they transfer two words per interface clock cycle $t_{CK}$. The \aca{hbm} standard defines two modes of operation~-~in legacy mode, the data bus operates as is. -In \ac{pch} mode, the data bus is split in half (i.e., 64-bit) to allow independent data tranfer, further increasing parallelism, while sharing a common command and address bus between the two \acp{pch}. +In \ac{pch} mode, the data bus is split in half (i.e., 64-bit) to allow independent data transfer, further increasing parallelism, while sharing a common command and address bus between the two \acp{pch}. With a $t_{CK}$ of $\qty{1}{\giga\hertz}$, \aca{hbm} achieves a pin transfer rate of $\qty{2}{\giga T \per\second}$, which results in $\qty[per-mode=symbol]{16}{\giga\byte\per\second}$ per \ac{pch} and a total of $\qty[per-mode = symbol]{256}{\giga\byte\per\second}$ for the 1024-bit wide data bus of each stack. A single data transfer is performed with either a \ac{bl} of 2 in legacy mode or 4 in \ac{pch} mode. Thus, accessing \aca{hbm} in \ac{pch} mode transmits a $\qty{256}{\bit}=\qty{32}{\byte}$ burst with a \ac{bl} of four over the $\qty{64}{\bit}$ wide data bus. diff --git a/src/chapters/implementation/library.tex b/src/chapters/implementation/library.tex index d63b5dc..e474cc8 100644 --- a/src/chapters/implementation/library.tex +++ b/src/chapters/implementation/library.tex @@ -15,7 +15,7 @@ Such a \ac{pim} library must include the following essential features to fully i \end{itemize} As already discussed in \cref{sec:vm}, for simplicity and debugability reasons, the host processor communicates with the \ac{pim} model in the \ac{dram} using a \ac{json}-based protocol. -To achieve this, a small shared library, that defines the communication data structures as well as routines to serialize and deserialize them, is linked by both the \ac{pim} support library as well as the \ac{pim} model in DRAMSys. +To achieve this, a small shared library, that defines the communication data structures as well as routines to serialize and deserialize them, is linked by both the \ac{pim} support library and the \ac{pim} model in DRAMSys. A predefined memory region is then used to differentiate these communication messages from the regular memory traffic. Ideally, this memory region is also set as non-cacheable, so that the messages do not get stuck in the on-chip cache. Alternatively, the software library must ensure that the cache is flushed after the \ac{json} message is written to the memory region. diff --git a/src/chapters/implementation/vm.tex b/src/chapters/implementation/vm.tex index 71d34aa..2eec0de 100644 --- a/src/chapters/implementation/vm.tex +++ b/src/chapters/implementation/vm.tex @@ -61,7 +61,7 @@ If the new jump counter has not yet reached zero, the jump to the offset instruc If not, the execution continues as is. This implementation only works for non-nested JUMP instructions, as each level of nesting would require its own jump counter. From the information provided by Samsung, it is not clear whether nested JUMP instructions are implemented in \aca{fimdram}. -However, none of the microkernels examined in this thesis use nested JUMPs. +However, none of the microkernels examined in this thesis use nested JUMP instructions. As already seen in \cref{tab:instruction_set}, only the FILL instruction supports writing to the memory bank. Therefore, it is the only instruction implemented in the \texttt{execute\_write} method. diff --git a/src/chapters/vp.tex b/src/chapters/vp.tex index f86f56a..037365b 100644 --- a/src/chapters/vp.tex +++ b/src/chapters/vp.tex @@ -30,7 +30,7 @@ As a result, gem5 provides a comprehensive framework for simulating and analyzin Two different modes can be used with gem5: full system simulation and system call emulation. In full system mode, gem5 boots a complete operating system kernel and runs the user's application on top of it to generate detailed statistics. -In system call emulation mode, the simulated application uses the syscall interface of the host operating system, ignoring the timing effects of these operating system routines. +In system call emulation mode, the simulated application uses the system call interface of the host operating system, ignoring the timing effects of these operating system routines. Consequently, the full system mode provides better accuracy, while the system call emulation mode takes less time to complete the simulation. In addition to the integrated components of the platform, gem5 provides a SystemC \ac{api} to enable the use of external SystemC models.