From 66d2aaacafd86ca71c64e4405ec5c61dcb4bf40c Mon Sep 17 00:00:00 2001
From: Derek Christ <christ.derek@gmail.com>
Date: Wed, 24 Jan 2024 19:33:45 +0100
Subject: [PATCH] Minor improvements/fixes

---
 src/chapters/introduction.tex | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/chapters/introduction.tex b/src/chapters/introduction.tex
index 7a51aea..c278419 100644
--- a/src/chapters/introduction.tex
+++ b/src/chapters/introduction.tex
@@ -2,7 +2,7 @@
 \label{sec:introduction}
 
 Emerging applications such as \acp{llm} revolutionize modern computing and fundamentally change how we interact with computing systems.
-An important compound of these models make use of \acp{dnn}, which are a type of machine learning model inspired by the structure of the human brain - composed of multiple layers of interconnected nodes that mimic a network of neurons, \acp{dnn} are utilized to perform various tasks such as image recognition or natural language and speech processing.
+A key component of these models is the use of \acp{dnn}, which are a type of machine learning model inspired by the structure of the human brain - composed of multiple layers of interconnected nodes that mimic a network of neurons, \acp{dnn} are utilized to perform various tasks such as image recognition or natural language and speech processing.
 Consequently, \acp{dnn} make it possible to tackle many new classes of problems that were previously beyond the reach of conventional algorithms.
 
 However, the ever-increasing use of these technologies poses new challenges for hardware architectures, as the energy required to train and run these models reaches unprecedented levels.
@@ -25,11 +25,11 @@ The exponential grow in compute energy will eventually be constrained by market
 It is therefore required to achieve radical improvements in energy efficiency in order to avoid such a scenario.
 
 In recent years, domain-specific accelerators, such as \acp{gpu} or \acp{tpu} have become very popular, as they provide orders of magnitude higher performance and energy efficiency for \ac{ai} applications \cite{kwon2021}.
-However, research must also consider the off-chip memory - the date movement between the computation unit and the \ac{dram} has a high cost as fetching operands costs more than doing the computation on them.
+However, research must also take into account off-chip memory - moving data between the computation unit and the \ac{dram} is very costly, as fetching operands uses consumes more power than performing the computation on them itself.
 While performing a double precision floating point operation on a $\qty{28}{\nano\meter}$ technology might consume an energy of about $\qty{20}{\pico\joule}$, fetching the operands from \ac{dram} consumes almost 3 orders of magnitude more energy at about $\qty{16}{\nano\joule}$ \cite{dally2010}.
 
-Furthermore, many types of \ac{dnn} used for language and speech processing such as \acp{rnn}, \acp{mlp} and some layers of \acp{cnn} are severely limited by the memory-bandwidth that the \ac{dram} can provide, in contrast to compute-intensive workloads such as visual processing \cite{he2020}.
-Such workloads are referred to as \textit{memory-bound}.
+Furthermore, many types of \ac{dnn} used for language and speech processing, such as \acp{rnn}, \acp{mlp} and some layers of \acp{cnn}, are severely limited by the memory bandwidth that the \ac{dram} can provide, making them \textit{memory-bounded} \cite{he2020}.
+In contrast, compute-intensive workloads, such as visual processing, are referred to as \textit{compute-bound}.
 
 \begin{figure}[!ht]
 	\centering
@@ -38,16 +38,16 @@ Such workloads are referred to as \textit{memory-bound}.
 	\label{plt:roofline}
 \end{figure}
 
-In the past, specialized types of \ac{dram} such as \ac{hbm} have been able to meet high bandwidth requirements.
-However, recent AI technologies require even greater bandwidth than \ac{hbm} can provide \cite{kwon2021}.
+In the past, specialized types of \ac{dram} such as \ac{hbm} have been able to meet the high bandwidth requirements.
+However, recent \ac{ai} technologies require even greater bandwidth than \ac{hbm} can provide \cite{kwon2021}.
 
-All things considered, to meet the need for energy-efficient computing systems, which are increasingly becoming memory-bound, new approaches to computing are required.
+All things considered, to meet the need for more energy-efficient computing systems, which are increasingly becoming memory-bounded, new approaches to computing are required.
 This has led researchers to reconsider past \ac{pim} architectures and advance them further \cite{lee2021}.
 \Ac{pim} integrates computational logic into the \ac{dram} itself, to exploit minimal data movement cost and extensive internal data parallelism \cite{sudarshan2022}.
 
 This work analyzes various \ac{pim} architectures, identifies the challenges of integrating them into state-of-the-art \acp{dram}, examines the changes required in the way applications lay out their data in memory and explores a \ac{pim} implementation from one of the leading \ac{dram} vendors.
-The remainder of it is structured as follows:
-Section \ref{sec:dram} gives a brief overview of the architecture of \acp{dram}, in detail that of \acp{hbm}.
+The remainder is structured as follows:
+Section \ref{sec:dram} gives a brief overview of the architecture of \acp{dram}, in detail that of \ac{hbm}.
 In section \ref{sec:pim} various types of \ac{pim} architectures are presented, with some concrete examples discussed in detail.
 Section \ref{sec:vp} is an introduction to virtual prototyping and system-level hardware simulation.
 After explaining the necessary prerequisites, section \ref{sec:implementation} implements a concrete \ac{pim} architecture in software and provides a development library that applications can use to take advantage of in-memory processing.