Smaller refactorings in result chapter
This commit is contained in:
@@ -8,7 +8,7 @@ A working \ac{vp} of \aca{fimdram}, in the form of a software model, has been de
|
||||
This made it possible to explore the performance gain of \ac{pim} for different workloads in a simple and flexible way.
|
||||
|
||||
It was found that \ac{pim} can provide a speedup of up to $\qty{23.9}{\times}$ for level 1 \ac{blas} vector operations and up to $\qty{62.5}{\times}$ for level 2 \ac{blas} operations.
|
||||
While these results may not strictly represent a real-world system, an achievable upper bound of speedups of $\qty{17.6}{\times}$ and $\qty{9.0}{\times}$ could be determined using a hypothetical infinite compute system.
|
||||
While these results may not strictly represent a real-world system, an achievable speedup of $\qty{17.6}{\times}$ and $\qty{9.0}{\times}$ could be determined using a hypothetical infinite compute system.
|
||||
This achieved speedup of $\qty{9.0}{\times}$ for the \ac{gemv} routine largely matches the number of Samsung's real-world implementation of \aca{fimdram} at about $\qty{8.3}{\times}$.
|
||||
In addition to the numbers presented by Samsung, the same simulation workloads were run on two real \ac{gpu} systems, both with \aca{hbm}, and their runtimes were compared.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user