Fix bug in Simulation chapter
This commit is contained in:
@@ -16,7 +16,7 @@ The external clocking of the memory bus itself is $\qty{4}{\times}$ higher with
|
|||||||
Thus, with both the 16-wide \ac{fp} adder and the 16-wide \ac{fp} multiplier, a single processing unit achieves a throughput of $\num{2} \cdot \qty{16}{FLOP} \cdot \qty{250}{\mega\hertz}=\qty{8}{\giga FLOPS}$.
|
Thus, with both the 16-wide \ac{fp} adder and the 16-wide \ac{fp} multiplier, a single processing unit achieves a throughput of $\num{2} \cdot \qty{16}{FLOP} \cdot \qty{250}{\mega\hertz}=\qty{8}{\giga FLOPS}$.
|
||||||
In total, the 16 processing units in a memory channel provide a throughput of $\num{16}\cdot\qty{8}{\giga FLOPS}=\qty{128}{\giga FLOPS}$.
|
In total, the 16 processing units in a memory channel provide a throughput of $\num{16}\cdot\qty{8}{\giga FLOPS}=\qty{128}{\giga FLOPS}$.
|
||||||
To compare this throughput to the vector processing unit of a real processor, a highly simplified assumption can be made based on the ARM NEON architecture that holds 8 \ac{fp16} numbers in a single $\qty{128}{\bit}$ vector register \cite{arm2020}.
|
To compare this throughput to the vector processing unit of a real processor, a highly simplified assumption can be made based on the ARM NEON architecture that holds 8 \ac{fp16} numbers in a single $\qty{128}{\bit}$ vector register \cite{arm2020}.
|
||||||
Assuming the single processor core runs at a frequency of $\qty{3}{\giga\hertz}$, the vector processing unit can achieve a maximum throughput of $\qty{8}{FLOP} \cdot \qty{3}{\giga\hertz}=\qty{24}{FLOPS}$, which is about $\qty{5}{\times}$ less than the \aca{fimdram} throughput of a single channel.
|
Assuming the single processor core runs at a frequency of $\qty{3}{\giga\hertz}$, the vector processing unit can achieve a maximum throughput of $\qty{8}{FLOP} \cdot \qty{3}{\giga\hertz}=\qty{24}{\giga FLOPS}$, which is about $\qty{5}{\times}$ less than the \aca{fimdram} throughput of a single channel.
|
||||||
|
|
||||||
% some implementation details
|
% some implementation details
|
||||||
% hbm size, channel...
|
% hbm size, channel...
|
||||||
|
|||||||
Reference in New Issue
Block a user