From aed31f282da9caac210e195568ba1967c22c7ce6 Mon Sep 17 00:00:00 2001 From: "christ.derek" Date: Sat, 31 Aug 2024 07:32:35 +0000 Subject: [PATCH] Update on Overleaf. --- samplepaper.tex | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/samplepaper.tex b/samplepaper.tex index 0b10589..a68ffc9 100644 --- a/samplepaper.tex +++ b/samplepaper.tex @@ -47,7 +47,8 @@ %% % \documentclass[manuscript, screen, review]{acmart} % \documentclass[sigconf, review, anonymous]{acmart} -\documentclass[sigconf]{acmart} +\documentclass[sigconf, nonacm=true]{acmart} +\setcopyright{none} %% %% \BibTeX command to typeset BibTeX logo in the docs @@ -150,7 +151,7 @@ \begin{document} % -\title[PIMSys: A Virtual Prototype for Processing in Memory]{PIMSys:\\A Virtual Prototype for Processing in Memory} +\title{PIMSys: A Virtual Prototype for Processing in Memory} %% %% The "author" command and its associated commands are used to define @@ -160,7 +161,7 @@ %% used to denote shared contribution to the research. \author{Derek Christ} \email{derek.christ@iese.fraunhofer.de} -% \orcid{1234-5678-9012} +\orcid{0009-0005-4234-6362} \affiliation{% \institution{Fraunhofer IESE} \city{Kaiserslautern} @@ -169,6 +170,7 @@ \author{Lukas Steiner} \email{lukas.steiner@rptu.de} +\orcid{0000-0003-2677-6475} \affiliation{% \institution{RPTU Kaiserslautern-Landau} \city{Kaiserslautern} @@ -176,6 +178,7 @@ \author{Matthias Jung} \email{m.jung@uni-wuerzburg.de} +\orcid{0000-0003-0036-2143} \affiliation{% \institution{JMU Würzburg} \city{Würzburg} @@ -183,6 +186,7 @@ \author{Norbert Wehn} \email{norbert.wehn@rptu.de} +\orcid{0000-0002-9010-086X} \affiliation{% \institution{RPTU Kaiserslautern-Landau} \city{Kaiserslautern} @@ -256,9 +260,9 @@ UPMEM integrates standard DDR4 DIMM-based DRAM with a series of PIM-enabled UPME Each \ac{pim} chip houses eight \acp{dpu}, each with dedicated access to a 64 MiB memory bank, a 24 KiB instruction memory, and a 64 KiB scratchpad memory. These \acp{dpu} function as multithreaded 32-bit \ac{risc} cores, featuring a complete set of general-purpose registers and a 14-stage pipeline~\cite{gomhaj_21}. Even prior to UPMEM, Micron introduced its automata processor \cite{wang2016}. -It features a nondeterministic finite automaton (NFA) inside the \ac{dram} to accelerate certain algorithms. +It features a nondeterministic finite automaton (NFA) inside the DRAM to accelerate certain algorithms. In 2020, SK Hynix, a leading DRAM manufacturer, unveiled its \ac{pim} technology, named Newton, utilizing \ac{hbm}~\cite{he2020}. -Unlike UPMEM, Newton integrates small MAC units and buffers into the bank area of the DRAM to mitigate the space and power overhead of a fully programmable processor core. +Unlike UPMEM, Newton integrates small MAC units and buffers into the bank area of the DRAM to mitigate the area and power overhead of a fully programmable processor core. Following SK Hynix's lead, Samsung, another major DRAM manufacturer, announced its own \ac{pim} DRAM implementation named \ac{fimdram} one year later~\cite{lee2021}. With these new architectures on the horizon, it becomes crucial for system-level designers to assess whether these promising developments can enhance their applications. Furthermore, these emerging hardware architectures necessitate new software paradigms. It remains unclear whether libraries, compilers, or operating systems will effectively manage these new devices at the software level. Therefore, it is imperative to establish comprehensive virtual platforms for these devices, enabling real applications to be tested within a realistic architectural and software platform context. @@ -293,7 +297,7 @@ A slightly different approach is taken by PiMulator \cite{mosanu2022}, which doe In addition to \ac{pim} architectures from research, there are also virtual prototypes of industry architectures. Very recently, the authors of \cite{hyun2024} introduced uPIMulator, a cycle-accurate simulator that models UPMEM's real-world general-purpose \ac{pim} architecture. In addition to its automata processor, Micron introduced another \ac{pim} architecture called In-Memory Intelligence~\cite{finkbeiner2017}. -The new architecture places bit-serial computing elements at the sense amplifier level of a memory array. +The new architecture places bit-serial computing elements at the sense amplifier level of the memory array. Evaluations of In-Memory Intelligence are based on a custom Micron discrete event simulator that implements the hardware models. Similarly, to analyze the potential performance and power impact of Newton, SK~Hynix developed a virtual prototype based on the DRAMSim2~\cite{rosenfeld2011} cycle-accurate memory simulator, which models a \ac{hbm2} memory and the extended Newton DRAM protocol. However, \mbox{DRAMSim2} is more than 10 years old and several orders of magnitude slower than DRAMSys~\cite{steiner2022a}. @@ -321,7 +325,7 @@ Pairwise multiplication of the input vector and a row of the matrix are used to Such an operation, defined in the widely used \ac{blas} library \cite{blas1979}, is also known as a \acs{gemv} routine. Because one matrix element is only used exactly once in the calculation of the output vector, there is no data reuse in the matrix. Further, as the weight matrices tend to be too large to fit into the on-chip cache, such a \ac{gemv} operation is deeply memory-bound \cite{he2020}. -As a result, such an operation is a good fit for \ac{pim}. +Consequently, such an operation is a good fit for \ac{pim}. Many different \ac{pim} architectures have been proposed by researchers in the past, and more recently real implementations have been introduced by hardware vendors. These proposals differ largely in the location of the processing operation, ranging from analog distribution of capacitor charges at the DRAM subarray level to additional processing units at the global I/O level. @@ -610,6 +614,11 @@ Furthermore, an examination of the wallclock time for simulations comparing non- In this work, the first system-level virtual prototype of Samsung's \ac{fimdram} is presented, enabling the rapid exploration and feasibility analysis of various workloads in a realistic and detailed manner. Looking ahead, future work should focus on providing estimations on the energy efficiency of the \ac{pim} architecture and on expanding the software framework to a Linux implementation, enabling further research on real-world AI applications. +\section*{Funding} +% +This work was partly funded by the German Federal Ministry of Education and +Research (BMBF) under grant 16ME0934K (DI-DERAMSys). + \bibliographystyle{ACM-Reference-Format} \bibliography{references}