Smaller fixes

This commit is contained in:
2024-03-13 14:45:04 +01:00
parent e0bd8a6cf3
commit 5235553962
2 changed files with 3 additions and 3 deletions

View File

@@ -50,7 +50,7 @@ Consequently, the default memory region and the \ac{pim} memory region are locat
In order to enable the on-chip caches and therefore be able to use the \ac{dram}, the page tables have to be set up, which are then used by the \ac{mmu} to map addresses between the virtual memory space and the physical memory space.
To simplify the virtual-physical translation, the \ac{dram} address space should only be mapped as a block at a certain offset in the virtual address space.
In the attributes of the page table, each mapped block of address space can be assigned a special cache policy, such as cacheable and non-cacheable.
While most of the \ac{dram} area are should be a normal, cacheable memory region, the \ac{pim} region should be marked as a non-cacheable memory for reasons explained in \cref{sec:microkernel_execution}.
While most of the \ac{dram} region should be a normal, cacheable memory region, the \ac{pim} region should be marked as non-cacheable memory for reasons explained in \cref{sec:microkernel_execution}.
Furthermore, special memory-mapped devices such as the \ac{uart}, which is used to print logging messages to the \ac{stdout}, must be marked as a non-cacheable device region, as otherwise the log messages may get held in the cache and not be written until the cache line is eventually flushed.
In the AArch64 execution mode, the operating system can choose from three different granule sizes for the translation tables: $\qty{4}{\kibi\byte}$, $\qty{16}{\kibi\byte}$ and $\qty{64}{\kibi\byte}$.

View File

@@ -173,7 +173,7 @@ Due to the focus on \ac{dnn} applications in \aca{fimdram}, the native data type
In addition, \ac{fp16} is well-supported on modern processor architectures such as ARMv8, which not only include \ac{fp16} \acp{fpu} themselves, but also support \ac{simd} operations using special vector registers.
The \ac{simd} \acp{fpu} of the processing units is implemented once as a \ac{fp16} multiplier unit, and once as a \ac{fp16} adder unit, providing support for these basic algorithmic operations.
In addition to the \acp{fpu}, a processing unit consists also of \acp{crf}, \acp{srf} and \acp{grf}.
The \ac{crf} acts as an instruction buffer, holding the 32 32-bit instructions to be executed by the processor when performing a memory access.
The \ac{crf} acts as an instruction buffer, holding the 32 instructions to be executed by the processor when performing a memory access.
One program that is stored in the \ac{crf} is called a \textit{microkernel}.
As explained earlier, the operands of an instruction come either directly from the bank or from the \acp{srf} or \acp{grf}.
Each \ac{grf} consists of 16 registers, each with the \aca{hbm} prefetch size of 256 bits, where each entry can hold the data of a full memory burst.
@@ -270,7 +270,7 @@ One solution to this problem would be to introduce memory barriers between each
However, this comes at a significant performance cost and results in memory bandwidth being underutilized because the host processor has to wait for every memory access to complete.
Disabling memory controller reordering completely, on the other hand, interferes with non-\ac{pim} traffic and significantly reduces its performance.
To solve this overhead, Samsung has introduced the \acf{aam} mode for arithmetic instructions.
To solve this overhead, Samsung has introduced the \acf{aam} for arithmetic instructions.
In the \ac{aam} mode, the register indices of an instruction are ignored and decoded from the column and row address of the memory access itself, as demonstrated in \cref{img:aam}.
With this method, the register indices and the bank address cannot get out of sync, as they are tightly coupled, even if the memory controller reorders the order of the accesses.