Fixes from Niklas, Johannes, Hendrik

This commit is contained in:
2022-08-05 10:54:38 +02:00
parent 98add62119
commit 27ec50fab7
8 changed files with 48 additions and 53 deletions

View File

@@ -1,17 +1,17 @@
\section{Future Work}
\section{Conclusion and Future Work}
\label{sec:future_work}
Due to the complexity of possible memory subsystem configurations, simulation is an indispensable part of the development process of today's systems.
It not only has an high impact on the development cost but also significantly reduces the time-to-market and enables the rapid release of new products.
However, the accurate simulation of a specific application takes a large period of time because of the detailed processor core models.
On the other hand, fixed or relative time memory traces allow faster simulation at the expense of accuracy, which makes them often unsuitable.
To fill this gap, this thesis introduced a new simulation frontend for DRAMSys, that is fast and makes only few compromises on accuracy.
To fill this gap, this thesis introduced a new simulation frontend for DRAMSys, which fastens the process while only making few compromises on accuracy.
In conclusion, the newly developed instrumentation tool provides an flexible way of generating traces for arbitrary multi-threaded applications.
In conclusion, the newly developed instrumentation tool provides a flexible way of generating traces for arbitrary multi-threaded applications.
The mature DRAMSys simulator framework then can be used to explore the design space and vary numerous configuration parameters of the DRAM subsystem to find a well-suited set of options.
It was shown that in comparison to the well-established full-system simulation framework gem5, only some deviations have to be accepted.
Also, the Pin-Tool based memory access tracing of the Ramulator DRAM simulator was compared to the new fronted. %(ergenisse kurz hier zusammenfassen)
Also, the Pin-Tool based memory access tracing of the Ramulator DRAM simulator was compared to the new frontend. %(ergenisse kurz hier zusammenfassen)
Although Ramulator takes a slightly different approach to trace generation than this thesis, a very good correlation in the results could be demonstrated.
A noteworthy advantage of the newly developed tool is its support for all hardware architectures that DynamoRIO provides (currently IA-32, x86-64, ARM, and AArch64) in contrast to the supported architectures of Pin (IA-32 and x86-64).
@@ -23,7 +23,7 @@ As mentioned in \ref{sec:cache_implementation}, the cache models do not yet guar
Although this can be a complex task, it is possible to implement this in future work.
A less impactful inaccuracy results from the scheduling of the applications threads in the new simplified core models.
While an application can spawn a arbitrary number of threads, the platform may not be able to process them all in parallel.
While an application can spawn an arbitrary number of threads, the platform may not be able to process them all in parallel.
Currently, the new trace player does not take this into account and runs all threads in parallel.
This deviation could be prevented by recording used processor cores on the initial system and using this information to better match the scheduling.
@@ -42,5 +42,3 @@ In the future, the DynamoRIO tool could decode those computational instructions
One significant improvement that still could be applied is the consideration of dependencies between the memory accesses.
Similarily to the elastic trace player of gem5 \cite{Jagtap2016}, which captures data load and store dependencies by instrumenting a detailed out-of-order processor model, the DynamoRIO tool could create a dependency graph of the memory accesses using the decoded instructions.
By using this technique, it is possible to also model out-of-order behavior of modern processors and make the simulation more accurate, whereas the current implementation is entirely in-order.
These mentioned potential improvements could make the new simulation frontend for DRAMSys even more accurate.