Clarify layout of additional matrix rows
This commit is contained in:
@@ -371,9 +371,9 @@ The operation of this concrete \ac{gemv} microkernel is illustrated in \cref{img
|
|||||||
|
|
||||||
In the \cref{img:memory_layout} it can be seen that a processing unit is responsible for multiplying and adding one row of the matrix with the input vector in eight cycles, forming the partial sum.
|
In the \cref{img:memory_layout} it can be seen that a processing unit is responsible for multiplying and adding one row of the matrix with the input vector in eight cycles, forming the partial sum.
|
||||||
This example only demonstrates the execution of the native matrix dimensions for one \ac{pch}.
|
This example only demonstrates the execution of the native matrix dimensions for one \ac{pch}.
|
||||||
Increasing the number of rows in the matrix simply requires additional iterations of this 8-cycle microkernel, while feeding in the other memory addresses for the subsequent matrix rows.
|
Increasing the number of rows in the matrix requires additional iterations of this 8-cycle microkernel, while feeding in the other memory addresses for the subsequent matrix rows.
|
||||||
|
However, the additional matrix rows must be stored as a separate matrix after the first 8-row matrix block, forming an array of separate 8-row matrices.
|
||||||
As a side effect of the incremented matrix row address, this also results in an increment of the \ac{grf}-B index, making it possible to increase the maximum number of matrix rows to $8 \cdot 8=64$ before all eight \ac{grf}-B entries are filled with partial sums, as demonstrated in \cref{lst:gemv64}.
|
As a side effect of the incremented matrix row address, this also results in an increment of the \ac{grf}-B index, making it possible to increase the maximum number of matrix rows to $8 \cdot 8=64$ before all eight \ac{grf}-B entries are filled with partial sums, as demonstrated in \cref{lst:gemv64}.
|
||||||
|
|
||||||
\begin{listing}
|
\begin{listing}
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
MAC(AAM) GRF_B, BANK, GRF_A
|
MAC(AAM) GRF_B, BANK, GRF_A
|
||||||
@@ -382,9 +382,10 @@ JUMP -1, 63
|
|||||||
\caption[The core of a \ac{mac} microkernel that utilizes the maximum number of register entries]{The core of a \ac{mac} microkernel that utilizes the maximum number of register entries.}
|
\caption[The core of a \ac{mac} microkernel that utilizes the maximum number of register entries]{The core of a \ac{mac} microkernel that utilizes the maximum number of register entries.}
|
||||||
\label{lst:gemv64}
|
\label{lst:gemv64}
|
||||||
\end{listing}
|
\end{listing}
|
||||||
|
A further increase in the total number of rows can be achieved by distributing the weight matrix over multiple \acp{pch} and running the microkernel multiple times, concatenating the output vectors on the host at the end.
|
||||||
|
|
||||||
To increase the number of columns, new entries of the input vector must be loaded into the processing units.
|
To increase the number of columns, new entries of the input vector must be loaded into the processing units.
|
||||||
Therefore, it is necessary to execute the complete \ac{gemv} microkernel several times with different input vector chunks and weight matrix columns.
|
Therefore, it is necessary to execute the entire \ac{gemv} microkernel several times with different input vector chunks and weight matrix columns, and merge the resulting output vectors by adding them on the host.
|
||||||
In general, the more the dimensions exceed the native \ac{pim} matrix dimensions, the more often the \ac{mac} core of the \ac{gemv} microkernel must be executed.
|
In general, the more the dimensions exceed the native \ac{pim} matrix dimensions, the more often the \ac{mac} core of the \ac{gemv} microkernel must be executed.
|
||||||
|
|
||||||
\subsubsection{Performance and Power Efficiency Effects}
|
\subsubsection{Performance and Power Efficiency Effects}
|
||||||
|
|||||||
Reference in New Issue
Block a user