Fix bug in Simulation chapter

2024-02-27 19:10:22 +01:00
parent 6dc73c0b04
commit f0014161d9
1 changed files with 1 additions and 1 deletions
--- a/src/chapters/results.tex
+++ b/src/chapters/results.tex
@@ -16,7 +16,7 @@ The external clocking of the memory bus itself is $\qty{4}{\times}$ higher with
 Thus, with both the 16-wide \ac{fp} adder and the 16-wide \ac{fp} multiplier, a single processing unit achieves a throughput of $\num{2} \cdot \qty{16}{FLOP} \cdot \qty{250}{\mega\hertz}=\qty{8}{\giga FLOPS}$.
 In total, the 16 processing units in a memory channel provide a throughput of $\num{16}\cdot\qty{8}{\giga FLOPS}=\qty{128}{\giga FLOPS}$.
 To compare this throughput to the vector processing unit of a real processor, a highly simplified assumption can be made based on the ARM NEON architecture that holds 8 \ac{fp16} numbers in a single $\qty{128}{\bit}$ vector register \cite{arm2020}.
-Assuming the single processor core runs at a frequency of $\qty{3}{\giga\hertz}$, the vector processing unit can achieve a maximum throughput of $\qty{8}{FLOP} \cdot \qty{3}{\giga\hertz}=\qty{24}{FLOPS}$, which is about $\qty{5}{\times}$ less than the \aca{fimdram} throughput of a single channel.
+Assuming the single processor core runs at a frequency of $\qty{3}{\giga\hertz}$, the vector processing unit can achieve a maximum throughput of $\qty{8}{FLOP} \cdot \qty{3}{\giga\hertz}=\qty{24}{\giga FLOPS}$, which is about $\qty{5}{\times}$ less than the \aca{fimdram} throughput of a single channel.

 % some implementation details
 % hbm size, channel...