%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% %% %% Please do not use \input{...} to include other tex files. %% %% Submit your LaTeX manuscript as one .tex document. %% %% %% %% All additional figures and files should be attached %% %% separately and not embedded in the \TeX\ document itself. %% %% %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % see https://www.springer.com/journal/10766/submission-guidelines#Instructions%20for%20Authors_Title%20Page for submission guidelines %%\documentclass[referee,sn-basic]{sn-jnl}% referee option is meant for double line spacing %%=======================================================%% %% to print line numbers in the margin use lineno option %% %%=======================================================%% %%\documentclass[lineno,sn-basic]{sn-jnl}% Basic Springer Nature Reference Style/Chemistry Reference Style %%======================================================%% %% to compile with pdflatex/xelatex use pdflatex option %% %%======================================================%% %%\documentclass[pdflatex,sn-basic]{sn-jnl}% Basic Springer Nature Reference Style/Chemistry Reference Style % necessary hack to load tikz because Springer Nature uses the "program" package which results in errors % see https://tex.stackexchange.com/a/615043 \RequirePackage[dvipsnames]{xcolor} \RequirePackage{tikz} %%\documentclass[sn-basic]{sn-jnl}% Basic Springer Nature Reference Style/Chemistry Reference Style \documentclass[sn-mathphys]{sn-jnl}% Math and Physical Sciences Reference Style %%\documentclass[sn-aps]{sn-jnl}% American Physical Society (APS) Reference Style %%\documentclass[sn-vancouver]{sn-jnl}% Vancouver Reference Style %%\documentclass[sn-apa]{sn-jnl}% APA Reference Style %%\documentclass[sn-chicago]{sn-jnl}% Chicago-based Humanities Reference Style %%\documentclass[sn-standardnature]{sn-jnl}% Standard Nature Portfolio Reference Style %%\documentclass[default]{sn-jnl}% Default %%\documentclass[default,iicol]{sn-jnl}% Default with double column layout %%%% Standard Packages \usepackage[dvipsnames]{xcolor} \newcommand\todo[1]{\textcolor{red}{#1}} \newcommand\new[1]{\textcolor{blue}{#1}} \newcommand\newer[1]{\textcolor{Green}{#1}} \newcommand\reviewer[1]{ \textcolor{gray}{\textit{#1}\vspace{0.25cm}} } \newcommand\answer[1]{ #1\vspace{0.25cm} } \usepackage{graphicx} \usepackage{tabularray} \usepackage{siunitx} \DeclareSIUnit\transfer{T} \sisetup{per-mode = symbol} \usepackage{amsmath} \usepackage{ifthen} %\usepackage{tikz} \usetikzlibrary{positioning} \usetikzlibrary{backgrounds} \usetikzlibrary{arrows.meta} \usepackage{subcaption} \usepackage{minted} \definecolor{LightGray}{gray}{0.9} \usepackage{pgfplots} \pgfplotsset{compat=1.9} \usepackage{circuitikz} \usetikzlibrary{fit} \usetikzlibrary{calc} \lstset{ literate={~} {$\sim$}{1} } %\usepackage[hidelinks]{hyperref} --> bereits in template geladen %%%% %%%%%=============================================================================%%%% %%%% Remarks: This template is provided to aid authors with the preparation %%%% of original research articles intended for submission to journals published %%%% by Springer Nature. The guidance has been prepared in partnership with %%%% production teams to conform to Springer Nature technical requirements. %%%% Editorial and presentation requirements differ among journal portfolios and %%%% research disciplines. You may find sections in this template are irrelevant %%%% to your work and are empowered to omit any such section if allowed by the %%%% journal you intend to submit to. The submission guidelines and policies %%%% of the journal take precedence. A detailed User Manual is available in the %%%% template package for technical guidance. %%%%%=============================================================================%%%% \jyear{2022}% \raggedbottom %%\unnumbered% uncomment this for unnumbered level heads \begin{document} \section*{Letter to the Reviewers} % Dear Editor, thank you for the valuable reviews of our journal paper. We revised the paper according to the recommendations of the reviewers. We used the long reviewing time also to further improve the quality and % \subsection*{Reviewer 1} % \reviewer{The authors have already presented the System-C-based methodology called Split'n'Cover for hardware safety analysis in a previous publication. This paper extends their work by analyzing a hardware system for automotive applications using LPDDR5 memories. A safety and performance analysis, taking into account the ISO 26262 norm and the new features provided by the LPDDR5, are part of the new content. The results show that the bandwidth and storage overhead derived from the new error correction techniques introduced by the LPDDR5 memories are up to 14\% and 12\%, respectively. In comparison to the previous publication, more than 30\% of the content of the current paper is novel.} \reviewer{This paper is well-written and based on previous publications. Sections (0) Introduction, (1) Background, (2) Related Work, (3) Methodology, and (4) Implementation are almost the same. No extensions are required so that, initially, the proposed methodology does not change. As in the previous paper, it is easy to understand the proposed methodology and its implementation.} \reviewer{Section (5) Case Study is new, introducing the new features implemented on the LPDDR5 memory. The authors emphasize the significance of the Link Error Correction Code (Link ECC) in minimizing transmission errors caused by high data rates. A safety model and a performance model are introduced.} \reviewer{Section (6) presents the safety and performance analysis. The results are exactly the same as those presented in the previous publication for LPDDR4. As mentioned by the authors, LPDDR5 introduces the Link ECC; Therefore, a more exhaustive explanation of the reason of non-improvement is desired. Please extend (if possible) this part of the paper.} \answer{We thank the reviewer for their suggestion of a more detailed explanation of the differences between the earlier LPDDR4 analysis and the extended LPDDR5 analysis. We agree that the minor differences in the results following from introducing the additional Link-ECC should be explained in more detail \todo{and have updated Section (7) Experimental Results to reflect this.}} \reviewer{Section "1" (Introduction) is missing after the abstract.} \answer{We thank the reviewer for pointing out that the section title for the introduction was missing. This has been corrected accordingly.} \subsection*{Reviewer 2} \reviewer{% - The paper is well-written and easy to follow. However, there is a less uniform text between the old and new text. } \answer{We have revised the paper, in particular abstract and introduction, and better harmonized the old and new texts.} \reviewer{% - The proposed approach is simple but sound and actually well suited for a composable method, aside from the main modeling of a complex platform. } \reviewer{% - The added sentence in the abstract in blue is misleading and does not provide to the reader what is expected by the authors. Instead of forcing the example of LPDRR there should be an added sentence about why the advent of consumer hardware is a major challenge. } \reviewer{% - Similar is for the introduction where there is exactly the same sentence. } \answer{We thank the reviewer for pointing out that the added sentence regarding the emergence of consumer hardware in autonomous systems might miss the message we wanted to convey by focusing on the example of LPDDR. It should more accurately refer to the aspect of new challenges posed by the use of consumer hardware in terms of security considerations. We have refined this part of the abstract and introduction to better convey the intended message.} \reviewer{% - The added contribution of this new version of the paper is limited. The main methodology is exactly the same as presented in the SAMOS paper, while we have only the LPDDR5 use case instead of LPDDR4. The added performance analysis has nothing to do with the core of the proposed approach, or at least this is my feeling from reading the paper. I think that this is the main issue that the paper has in its current form. } \reviewer{% - The author should probably consider describing a larger proposal to evaluate the impact of possible safety measures since the beginning of the paper. } \answer{We agree with the statement that the main methodology of the SystemC-based and ISO26262-compliant safety analysis first presented in the SAMOS paper is largely the same and would like to thank the reviewer for highlighting this. The additional contribution focuses on the new considerations regarding LPDDR5, such as the new link ECC mechanism added due to the increased interface failure rate. As correctly noted, the new performance analysis conducted is essentially orthogonal to the previous analysis and examines the bandwidth and latency impact of an additional in-line ECC mechanism. This analysis was motivated based on the results that suggested that additional safety mechanisms are needed to achieve a high level of ASIL compliance. We agree that the introduction should include the intent of the paper to analyze a further and larger proposal for safety analysis compared to the earlier SAMOS paper, and have incorporated this accordingly. The focus of the paper should now be clearer to the reader from the beginning on.} \subsection*{Reviewer 3} \reviewer{% This article describes an approach to computing hardware failure rates using SystemC. For this purpose, the authors implemented specific calculation blocks in SystemC. The authors argue that such an integrative approach is superior to established analysis techniques such as FTA and FMEDA. In general, the approach seems appealing at first glance, as the constructive inclusion of safety aspects in designs has many advantages over a posteriori analyses. And overall, the approach seems worthy of further development. However, as far as the concrete article is concerned, there are some major flaws from a safety perspective that should be revised: } \reviewer{% It starts with the related work section: the authors refer to FMEA. For quantitative analysis, which is the goal of their approach, the correct approach would be FMEDA. For FTA, the authors refer directly to component fault tree analysis. First of all, CFT was not introduced by Adler et al. but by Kaiser et al.: Kaiser, B., Liggesmeyer, P. and Mäckel, O., 2003, October. A New Component Concept for Fault Trees. In Proceedings of the 8th Australian workshop on Safety Critical Systems and Software-Volume 33 (pp. 37-46). Moreover, the aspect of \_component\_ fault trees is not the relevant aspect with which to compare their approach, but the general approach of FTA. Consequently, using the analysis concepts introduced by Adler et al. is the wrong benchmark. For example, it is not necessary to compute MCS as long as no qualitative analysis is required, but modern approaches provide very efficient computational engines for quantitative calculations based on BDDs. Possibly they refer to the integration of safety models to design models, but there is also other approaches following such an approach and this does not seem to be key aspect here. Furthermore, it is unclear why an FTA would not be appropriate for considering the introduction of new safety measures - this is what the FTA has been used for for decades. Furthermore, the authors ignore other approaches such as Hip-Hops, AltaRica, Markov models, etc. Their approach is nonetheless novel, but a sound related work analysis seems appropriate for an archival publication. Therefore, it is recommended that the authors provide a more accurate description of the state of the art and a clearer distinction from existing work. } \answer{We thank the reviewer for pointing out that the concept of component fault trees was in fact introduced by Kaiser et al. and we adjusted the reference accordingly. We have revised the related work section to more accurately describe the state of the art and the aspects relevant to the approach. } \reviewer{% Regarding the methodology, it is important to note that the key idea of ISO 26262 follows a different direction - a top down guidance for developing safe hardware. The ASILs are derived based on risks. Depending on the ASIL, the standard requires specific measures and mechanisms to be applied constructively in order to sufficiently reduce the residual probability of failure. The metrics were introduced quite late in the standardization process to verify the sufficiency of the applied mechanisms, but the mere compliance of metrics can't replace following the prescribed development process. For example, for the reuse of existing software, ISO 8926 is currently being developed as a dedicated PAS, as measuring metrics is often not considered sufficient evidence. This does not mean that the authors' approach cannot work, but they should proactively address this aspect and show that they understand the basic idea of ISO 26262. } \answer{We agree that our proposed methodology cannot replace the development process described in ISO and thank the reviewer for pointing this out. Rather, our approach is intended to more specifically support hardware developers during the design process by eliminating the need for additional translation steps to calculate the ISO required metrics and by facilitating the understanding of the impact of introduced safety mechanisms. Note this is important for a hardware developer (e.g. Tier 1/2) to facilitate a bottom-up integration process where promises (e.g. safety, performance,..) can be provided to system integrators. There are certainly other aspects that go into determining the ASIL level of a HW component with confidence. However, our approach does not claim to be a comprehensive solution in this respect. } \reviewer{% More critical, however, are some flaws in the math. There's a good reason why fault trees use probabilities instead of failure rates. In the case of Weibull distributions, a constant rate is only given for a certain period of time. For safety, however, we are interested in the worst case, which could be at the beginning or at the end, where a constant rate does not work. Also, one must be very careful not to confuse rates and probabilities. For example, the calculation of lamda\_RF is wrong. According to the description, c is some kind of diagnostic coverage, which is usually a constant probability, not an exponential distribution, i.e., not a rate. Mixing rates and probabilities leads to incorrect results. In this case, the error is on the conservative, i.e. safe side, because multiplying a constant probability by a rate means that the probability grows along the exponential distribution, leading to too high a failure probability. But it leaves the impression that the authors just got lucky. At the very least, they should explicitly state that they are aware of the problem and that they deliberately use a conservative approximation. In fact, diagnostic coverage is only the theoretical maximum that fault detection can achieve. Error detection can also fail due to random or systematic faults (which does not seem to be considered in their case study either). Therefore, the correct model would include an and-gate in a fault tree that models the error AND that the failure detection fails (with a constant probability modeling the DC OR due to a systematic/random exponentially distributed failure probability). Mathematically, however, the result of an AND gate does not have a constant failure rate. Passing this value on to the next calculation block assuming a constant failure rate can easily lead to calculation errors. As well as a wrong model that only considers an approximation of a constant probability as a factor multiplied by a rate. The same is true for the split block - at least to they mix constant probabilities with failure rates in their case study. It seems recommendable that the authors either rethink their approach of a rate-based calculation (which can easily get tricky) and use probabilities instead. Or, which would probably be the less cumbersome way, to explain (and prove) in more detail why they think they are right, or at least have a conservative approximation. } \answer{% We thank the reviewer for their thorough analysis of the mathematical soundness of our proposed model. We fully agree that from a strict mathematical standpoint, the constant fault rates are only an approximation and that the mixing of constant probabilities with those rates in turn do not result in constant rates. However, as our approach is oriented towards the metrics and analysis performed in the ISO26262. Not only assume we a constant failure rate as denoted in the Section \ref{sec:background} refering to the constant region of the bathtub curve, but we also we leverage the calculation principles of the ISO. For example, in our approach we calculate the residual fault rates of a coverage block by multiplying a probability with the input fault rate. This is in line with the calculation as done by Formula C.3 in ISO26262-5: $$\lambda_{RF} \leq \lambda_{RF,est} = \lambda \cdot \left(1-\frac{K_{DC,RF}}{100\%}\right)$$ Similarly, the latent multi point fault rate of our coverage block is calculated in accordance to Formula C.5: $$\lambda_{MPF,L} \leq \lambda_{MPF,L,est} = \lambda \cdot \left(1-\frac{K_{DC,MPF,L}}{100\%}\right)$$ Consequently, our approximations are in line with the simplifications that are done by the ISO, which itself refers to these formulas as conservative approximations. \todo{To better convey these approximations to the reader, we have now described this aspect in more detail in Section \ref{}}. Further, we fully agree with the reviewer that the error correction and detection capabilities of coverage mechanisms only denote the theoretical maximum, since they themselves could be a failing hardware component. In our approach we modeled this circumstance by introducing additional basic events that contribute to the total latent multi-point fault metric, as these are faults that become visible in combination with another independent fault. } \reviewer{% On a more minor note, it would be interesting to see how the approach handles common causes such as heat, EMR, etc. that affect multiple components at once, so that the individual failure rates are no longer independent, which would again lead to incorrect results. Also, the authors only refer to previous work to determine the failure rates of the basic events. However, we know that simply using different manuals to determine the failure rates of hardware parts can easily lead to differences of two orders of magnitude in the top event. Therefore, it would be interesting to see a sensitivity analysis regarding the robustness of their approach to input variances. Especially since the authors use very precise thresholds in their experimental results, e.g. they assign a budget of 53 FIT, i.e. they talk about 53E-9/h without considering confidence intervals. In terms of evaluation, it would be good to see a comparison of their approach with a traditional safety analysis to prove its correctness. } \answer{We thank the reviewer for their suggestion to further analyze the impact of common fault causes such as heat that affects multiple components at once. Indeed, such common causes would result in the fault rates no longer being independent and would require a more thorough analysis. However, we concentrate on our analysis on a safety element out of context: The integration of the memory system into the complete vehicle would go beyond the scope of this paper. Further, we agree that a more extensive sensitivity analysis regarding input variances would be a worthwhile effort that could be subject to further work. Regarding a comparison with traditional safety analysis, the reference Steiner et al. \cite{stekra_21} analyzes the corresponding LPDDR4 system with a traditional FTA approach, reaching a very similar result. % \begin{itemize} % \item Zustimmen, dass solche Analysen interessant wären (Fehlerraten verschiedener Basic Events mit gleicher Ursache) % \item hier system-out-of-context, keine komplettanalyse des Autos -> würde rahmen sprengen % \item Vergleich mit traditioneller Safety analysis -\> Vielleicht Verweis auf älteres Paper "An LPDDR4 Safety Model for Automotive Applications"? % \end{itemize} } \reviewer{% Overall, however, the approach as such is appealing and the aspects mentioned above seem to be solvable with a reasonable amount of effort and time. For safety, a certain rigor is required to pass a safety assessment, while the article leaves the impression of an inappropriate carelessness when it comes to safety calculations. Therefore, it seems highly recommendable that the authors treat the safety analysis and its math with the appropriate rigor and soundness. } \answer{We would like express our appreciation to reviewer for recognizing the appeal of our novel approach and the confidence in its potential. We also agree with the observation that the approximations involved should be more clearly described in the text, \todo{and have therefore overworked such clarifications so that our approach relies extensively on the estimates made in ISO26262.}} \newpage \title[Split'n'Cover: ISO\,26262 Hardware Safety Analysis with SystemC]{Split'n'Cover: ISO\,26262 Hardware Safety Analysis with SystemC} %%=============================================================%% %% Prefix -> \pfx{Dr} %% GivenName -> \fnm{Joergen W.} %% Particle -> \spfx{van der} -> surname prefix %% FamilyName -> \sur{Ploeg} %% Suffix -> \sfx{IV} %% NatureName -> \tanm{Poet Laureate} -> Title after name %% Degrees -> \dgr{MSc, PhD} %% \author*[1,2]{\pfx{Dr} \fnm{Joergen W.} \spfx{van der} \sur{Ploeg} \sfx{IV} \tanm{Poet Laureate} %% \dgr{MSc, PhD}}\email{iauthor@gmail.com} %%=============================================================%% \author*[1]{\fnm{Lukas} \sur{Steiner}}\email{lukas.steiner@rptu.de} \author[1]{\fnm{Kira} \sur{Kraft}}\email{kira.kraft@rptu.de} \author[2]{\fnm{Derek} \sur{Christ}}\email{derek.christ@iese.fraunhofer.de} \author[2]{\fnm{Denis} \sur{Uecker}}\email{denis.uecker@iese.fraunhofer.de} \author[2]{\fnm{Christian} \sur{Malek}}\email{christian.malek@iese.fraunhofer.de} \author[2,3]{\fnm{Matthias} \sur{Jung}}\email{matthias.jung@iese.fraunhofer.de} \author[1]{\fnm{Norbert}~\sur{Wehn}}\email{norbert.wehn@rptu.de} \affil[1]{\orgdiv{Microelectronics Systems Design Research Group}, \orgname{RPTU Kaiserslautern-Landau}, \orgaddress{\street{Erwin-Schrödinger-Straße 12}, \city{Kaiserslautern}, \postcode{67663}, \state{Rhineland-Palatinate}, \country{Germany}}} \affil[2]{\orgdiv{Embedded Systems}, \orgname{Fraunhofer IESE}, \orgaddress{\street{Fraunhofer-Platz 1}, \city{Kaiserslautern}, \postcode{67663}, \state{Rhineland-Palatinate}, \country{Germany}}} \affil[3]{\orgdiv{Computer Engineering}, \orgname{University of Würzburg}, \orgaddress{\street{Am Hubland}, \city{Würzburg}, \postcode{97218}, \state{Bavaria}, \country{Germany}}} \abstract{ The development of safe hardware is currently a major concern in the automotive industry. \newer{Due to the high computational and memory requirements of advanced driver-assistance systems and autonomous driving, consumer hardware such as LPDDR DRAM is being deployed in safety-critical areas.} Parts 5 and 11 of ISO\,26262 define procedures and methods for the development of hardware to achieve a specific automotive safety integrity level (ASIL). \newer{However, consumer devices like LPDDR DRAMs were not originally intended for use in these applications, so they only achieve low ASIL ratings. Additional safety measures can still be added at system level, but this often comes at the cost of reduced performance.} In this paper, we present a novel methodology that combines the hardware metrics analysis of ISO\,26262 with SystemC-based virtual prototyping. \newer{This enables the analysis of a system both from the safety as well as from the performance perspective using the same simulation setup.} To show the applicability of this methodology, we model an \new{LPDDR5} memory subsystem of a current state-of-the-art ADAS platform and evaluate both the ASIL \newer{as well as the performance impact of the safety measures.} The new methodology is fully implemented in SystemC and provided as open-source. % %Adding additional safety measures on system level instead of device level is still possible, but it comes at the cost of reduced performance. %Daher muss consumer HW so konfiguriert/angepasst werden, dass trotzdem ASIL erreicht werden kann. %These additional safety measures impact performance -> tradeoff between safety and performance %\new{Especially the advent of consumer hardware like LPDDR memories for autonomous driving is a major challenge for the automotive community.}\todo{consumer HW is not intended for use in safety-critical systems, only few safety measures, no metadata for redundancy..., additional safety measures can be introduced at the cost of reduced performance} %\todo{the author should probably consider describing a larger proposal to evaluate %the impact of possible safety measures since the beginning of the paper.} } \keywords{ISO\,26262, SystemC, DRAM, LPDDR4, LPDDR5, Safety} %%\pacs[JEL Classification]{D8, H51} %%\pacs[MSC Classification]{35A01, 65L10, 65L12, 65L20, 65L70} \maketitle % PLAN: % LPDDR4 durch LPDDR5 ersetzen % Neues Systemschaubild % Safety Analyse LPDDR5 % Busfehlerberechung von Kira % Performance Simulation von Derek --> Vorteile Orthogonale Simulation aufzeigen \newer{\section{Introduction}} \label{sec:intro} Functional safety is a major concern in the development of automotive applications because the lives of drivers, passengers, and other road users must be protected to the highest degree. \newer{With the increasing use of \textit{Advanced Driver-Assistance Systems}~(ADAS) and \textit{Autonomous Driving}~(AD), the computational and memory requirements in this domain have grown significantly. Classical \textit{Electronic Control Units}~(ECUs) consisting of dedicated automotive microcontrollers and SRAM can no longer meet these requirements, so more performant platforms are being developed. In the absence of specialized devices for safety-critical applications, some parts of these platforms are based on consumer hardware such as LPDDR DRAM.} The development of automotive components requires the usage of specific quality and safety standards, such as ISO\,26262~\cite{iso26262}. The implementation of this standard is intended to ensure the functional safety of a system with electrical/electronic components in road vehicles. Parts 5 and 11 of the standard in particular deal with the development processes at the hardware level and define procedures and methods for achieving a specific \textit{Automotive Safety Integrity Level}~(ASIL). For the development of a product, it is therefore very important to address safety concerns already from the beginning. State-of-the-art approaches used to estimate hardware metrics are based on spreadsheets and do not scale well to large hardware systems. \newer{In addition, because consumer hardware such as LPDDR DRAM was originally intended for a different domain, no specific automotive safety requirements were considered during design. This means that the devices themselves usually only achieve a low ASIL rating. Additional safety measures can be introduced at the system level in order to achieve the required safety goals for the entire platform. However, they add more complexity and can negatively impact performance. In the case of DRAM, this means lower bandwidth, higher latency, and reduced capacity. When developing a platform for the automotive domain, safety analysis must therefore always go hand in hand with performance analysis.} Virtual prototypes based on SystemC are high-speed, fully functional software models of physical hardware systems that can model complex hardware/software systems with reasonable simulation speed. These virtual prototypes are state of the art in industry to reduce time to market and improve the quality of the product~\cite{deschutter_14}. \new{This paper is an extended journal version of the previous SAMOS publication~\cite{uecjun_22} in which we presented a novel methodology for combining the advantages of SystemC-based virtual prototypes with the safety analysis required by the ISO\,26262 standard.} Unlike previous work, our approach does not focus on system simulation and fault injection, but rather on the specific methodology required by ISO\,26262 for evaluating hardware architecture metrics and how this can be implemented as an extension to the SystemC standard. Failure modeling can be seen as a type of modeling that is orthogonal to modeling functionality. With our approach, both aspects can be integrated in the same simulation models, which provides the opportunity to analyze structure, functionality, and safety aspects simultaneously. Due to the power of the SystemC framework, we have a high level of interoperability, and functional legacy models could be enhanced by our safety amendments. The presented method receives failure rates in \textit{Failure in Time}~(FIT) and directly calculates the achievable ASIL as well as the hardware metrics as outputs. \newer{In this journal version, we update the safety analysis of our previous work~\cite{uecjun_22} from LPDDR4 to LPDDR5 and extend it with an associated performance analysis. As a case study, we investigate the LPDDR5 memory subsystem of a current platform for automotive applications. LPDDR5 features more safety measures than its predecessor, and the platform adds additional error correction on system level. By combining functional and safety simulations with SystemC, we show the impact of these safety measures on performance.} \noindent In summary, we make the following contributions: \begin{itemize} \item We present a set of basic blocks that represent the operations required by ISO\,26262. \item We present, for the first time, a methodology called Split'n'Cover that uses these basic blocks to model and evaluate hardware systems with respect to the ASIL. \item We provide an open-source reference implementation as a SystemC library software\footnote{\url{https://github.com/myzinsky/ISO26262SystemC}}. \item We show the application of this methodology for an example \new{LPDDR5} DRAM memory subsystem in the automotive context. \item \new{We provide an analysis of an actual LPDDR5-based automotive control unit that combines performance simulations and safety simulations based on our new methodology. This way, trade-offs can already be analyzed on the system level.} \end{itemize} This paper is structured as follows: Section~\ref{sec:background} provides some background on ISO\,26262 and the required hardware metrics. Related work is discussed in Section~\ref{sec:related}. The methodology is presented in Section~\ref{sec:method}, whereas the actual implementation in SystemC is explained in Section~\ref{sec:implementation}. Section~\ref{sec:study} presents a case study for an \new{LPDDR5} DRAM memory system \new{and Section~\ref{sec:results} presents the experimental results}. Finally, Section~\ref{sec:conclusion} concludes the paper. \section{Background} \label{sec:background} In this section, we present the basic requirements on safety in order to understand the hardware metrics analysis of ISO\,26262. For the safety analysis of hardware, we follow the definitions of Laprie et\,al.~\cite{avilap_04}: \begin{description} \item[Fault:] Is a defect within the system and the root cause of the violation of a safety goal, e.g., a stuck-at 0 or single event upset due to a cosmic ray. \item[Error:] Is an erroneous internal state, e.\,g., in the memory or the CPU, where the fault becomes visible. \item[Failure:] Is when the error is observed and the system's behavior deviates from the specification. This might lead to the violation of a safety goal. \end{description} As shown in Figure~\ref{fig:bathtube}, hardware failure rates $\lambda(t)$ usually follow the so-called \textit{Bathtub Curve}. In phase \textbf{I}, we can observe early failures called \textit{Infant Mortality}. In phase \textbf{II}, there exists a constant failure rate, known as random failures. This phase is also called \textit{Useful Lifetime}. Therefore, a burn-in process of the product is used to artificially age the product, such that the product enters the marked in phase II. The third phase \textbf{III} shows the \textit{Wear-Out} of the product, where the failure rate increases due to wear-out, i.\,e., aging effects. % \begin{figure} \centering \includegraphics[width=\linewidth]{bathtube.pdf} \caption{Hardware Failure Bathtub Curve~\cite{iso26262}} \label{fig:bathtube} \end{figure} \newpage For the analysis of hardware failures, the ISO\,26262 assumes that the hardware is used during its useful lifetime and that the failure rate $\lambda(t)$ is constant (see ISO\,26262-11~\cite{iso26262}). % For constant failure rates, we can assume the exponential failure distribution $$F(t) = 1 - e^{-\lambda\cdot t}$$ The constant failure rate of this distribution $\lambda$ is measured in \textit{Failure in Time} (FIT), where 1~FIT represents one failure in $10^9$ hours, which is approximately one failure in 114,080 years. For the hardware metrics, ISO\,26262 distinguishes several different failure rates, among them: \begin{itemize} \item $\lambda_\mathrm{SPF}$ \textit{Single-Point Fault Failure Rate}: Considers faults that are not covered by any safety mechanism and immediately lead to the violation of a safety goal. \item $\lambda_\mathrm{RF}$ \textit{Residual Fault Failure Rate}: Considers faults where a safety mechanism is implemented, but is not controlled by the safety mechanism and leads to the violation of a safety goal. \item $\lambda_\mathrm{MPF}$ \textit{Multi-Point Fault Failure Rate}: Considers several independent faults, which in combination lead to the violation of a safety goal. For this paper, especially the latent faults $\lambda_\mathrm{MPF,L}$ are important, whose presence is neither detected by a safety mechanism nor perceived by the driver. \item $\lambda_\mathrm{S}$ \textit{Safe Fault Failure Rate}: Considers faults that do not have any significant influence on the violation of a safety goal. \end{itemize} \noindent The total failure rate is the sum of the above failure rates: $$\lambda = \lambda_\mathrm{SPF} + \lambda_\mathrm{RF} + \lambda_\mathrm{MPF} + \lambda_\mathrm{S}$$ % The ISO\,26262 furthermore specifies the hardware metrics used to evaluate the risk posed by hardware elements: \begin{description} \item[Single-Point Fault Metric (SPFM):] This metric reflects the coverage of a hardware element with respect to single-point faults either by design or by coverage via safety mechanisms. $$ \mathrm{SPFM} = 1 - \frac{\sum \left( \lambda_\mathrm{SPF} + \lambda_\mathrm{RF}\right)}{\sum \lambda}$$ \item[Latent Fault Metric (LFM):] This metric reflects the coverage of an hardware element with respect to latent faults either by design (primarily safe faults), fault coverage via safety mechanisms, or by the driver’s recognition of a fault’s existence within the fault-tolerant time interval of a safety goal. $$ \mathrm{LFM} = 1 - \frac{\sum \lambda_\mathrm{MPF,L}}{\sum \left(\lambda - \lambda_\mathrm{SPF} - \lambda_\mathrm{RF}\right)}$$ \end{description} \begin{table}[t] \centering \begin{tblr}{cccc} \hline \textbf{ASIL} & \textbf{SPFM} & \textbf{LFM} & \textbf{Residual FIT} \\ \hline \textbf{A} & - & - & $< 1000$ \\ \textbf{B} & $> 90\%$ & $> 60\%$ & $< 100 $ \\ \textbf{C} & $> 97\%$ & $> 80\%$ & $< 100 $ \\ \textbf{D} & $> 99\%$ & $> 90\%$ & $< 10 $ \\ \hline \end{tblr} \vspace{10pt} \caption{Requirements according to ISO\,26262~\cite{iso26262}} \label{tab:target} \end{table} Table \ref{tab:target} shows the required target values for $\lambda_\mathrm{RF}$, SPFM, and LFM to reach a specific ASIL. For example, the highest level ASIL\,D can only be reached if the SPFM is greater than 99\%, the LFM is greater than 90\%, and the residual failure rate is below 10. \section{Related Work} \label{sec:related} % This section discusses related work and the state of the art.\todo{new or newer?} \new{Today's safety standards, such as ISO\,26262, recommend techniques such as \textit{Failure Mode and Effects Analysis} (FMEA), Fault Tree Analysis (FTA) or Markov Chains for safety analysis. However, none of these techniques can be used directly to obtain the hardware metrics (SPFM and LFM). FMEA as an inductive analysis technique, is suitable for investigating the system from the bottom up and identifying root causes that lead to unwanted system effects. This is helpful in understanding the relationship between HW faults and safety goal violations at the system level. However, FMEA is a purely qualitative analysis technique and therefore not suitable for calculating the desired HW metrics. In practice, a \textit{Failure Mode Effects and Diagnostic Analysis} (FMEDA) is used instead. FMEDAs are typically performed using spreadsheets to systematically examine the HW of the system under analysis for root causes. For each root cause that has the potential to violate a safety goal, safety mechanisms are identified and the coverage with respect to residual and latent faults is determined, providing the basis for calculating the HW metrics for the system under consideration. A special aspect of this technique is that each HW element is analyzed atomically. This is an advantage from a modularization point of view but can be seen as a disadvantage in understanding complex functional dependencies between these elements. In this respect other techniques may be more appropriate.} \new{Analysis techniques, such as Markov Chains and Fault Trees, are more expressive and better suited to qualitatively and quantitatively examine complex dependencies. In practice, however, they are less favored, in part because they are typically required only when high assurance levels are required and in part because they can be more demanding from a modeling perspective. The latter is in particularly true for Markov Chains, as the models can grow very rapidly, leading to models that are difficult to understand and maintain. Fault Trees and in particular Component Fault Trees (CFT) introduced by \newer{Kaiser et\,al.~\cite{kailig_03}} offer a more structured approach. They have the ability to modularize and associate Fault Trees with each of the elements in the HW. Consequently, CFT models compose fault trees according to the hierarchical structure employed in the system design. From the point of view of computing HW metrics, as mentioned earlier, these techniques do not allow a direct computation, since they allow at most the calculation of the probability of occurrence of events. Theoretically, it would be possible to integrate new modeling elements, such as a "measure" type of event to integrate the computation of the coverage fraction of the safety mechanisms. However, this is likely to increase the modeling effort and the complexity of the models. Our method combines the most important aspects of the above techniques. First, we overcome the drawback of FMEDA by defining a graphical notation capable of maintaining the level of expressiveness similar to CFTs, while keeping the modeling approach simple, allowing us to investigate complex dependencies between the HW elements. Second, we maintain the simplified computation of the HW metrics of FMEDA by integrating dedicated modeling elements.} The usage of SystemC-based virtual prototypes for safety analysis is already well established. However, all these approaches focus on simulation of the functionality and injection of errors. For example, in~\cite{reipre_13}, the authors present how virtual prototypes can support the FMEA process. There also exist other works whose main focus is on fault injection during functional simulations~\cite{weisch_16},\cite{tabcha_16},\cite{silpar_14} and \cite{tab_19}. As mentioned above, all of these previous works focus on functional simulation and error injection for ISO\,26262 support. The focus of our work lies on the static hardware metrics analysis of ISO\,26262 and how it can be realized within SystemC. % % \include{blocks-safety-methodology} % \section{Methodology} \label{sec:method} In the following, we describe our new methodology for estimating the hardware metrics required by ISO\,26262. Similar to CFTs, our methodology is object-oriented, i.e., it models the system with the hardware components that also exist in reality. The safety behavior of each component is modeled in the component itself by using five central building blocks, which are shown in Figure~\ref{fig:blocks} and explained below. The \textit{Basic Event} block represents internal faults, with a specific failure rate~$\lambda_\mathrm{BE}$. The \textit{Sum} block receives the failure rates $\lambda_0$ to $\lambda_n$ and computes the sum of these failure rates. The \textit{Coverage} block can be used to model the \textit{Diagnostic Coverage}~(DC) of a specific safety measure. The input failure rate $\lambda_\mathrm{in}$ is reduced by the DC rate $c$ of this safety measure: % $$\lambda_\mathrm{RF}=\lambda_\mathrm{in}\cdot(1-c)$$ % For instance, if $\lambda=100\,$FIT and $c=0.95$, only 5\,\% of the failures, i.e., $5\,$FIT, are propagated. According to the ISO\,26262, the covered FITs must be added to the latent failures $\lambda_\mathrm{MPF,L}$ to consider the scenario where the safety measure is defect: % $$\lambda_\mathrm{MPF,L}=\lambda_\mathrm{in}\cdot c$$ % In our example, $95\,$FIT are propagated to the latent fault metrics if no other measure reduces these failures. The \textit{Split} block distributes the incoming failure rate to an arbitrary number of output ports according to specific rates $p_i$, where the condition % $$\sum_{i=0}^{n} p_i \leq 1$$ % must hold; otherwise, new failures would be created out of nowhere. It is possible for some parts of the incoming failure rate to completely vanish because of the split, i.e., they are not propagated. These faults are called \textit{Safe Faults} because they will never lead to a safety goal violation. The safe fault failure rate can therefore be described as: $$\lambda_{S} = \lambda_\mathrm{in} \cdot \left(1-\sum_{i=0}^{n} p_i \right)$$ In summary, the \textit{Split} block is used to model failure distributions caused by the system structure, e.g., when a data stream is divided, or when the safety mechanism adds additional errors during the correction of unsupported faults, such as double-bit errors in a single-error correction mechanism. The last required block is the \textit{ASIL} block, which calculates the ASIL from the $\lambda_\mathrm{SPF}$, $\lambda_\mathrm{RF}$, and $\lambda_\mathrm{MPF,L}$ within the entire system. This block implements the logic of Table~\ref{tab:target}. With these five blocks, it is possible to model the safety behavior of hardware in compliance with the ISO\,26262. We would like to mention here that it is only necessary to consider faults of safety-related components. Components that are not safety related do not have to be modeled at all, or their errors do not need to be modeled or connected (and thus not considered in the sum of all errors). In Section~\ref{sec:study}, we present the modeling of a real-world automotive memory system in order to understand the interaction of these blocks. % % \section{Implementation} \label{sec:implementation} % % In this section, we describe the implementation of the building blocks in \mbox{SystemC}, which is well established and a de-facto industry standard. Therefore, there are already many functional simulation models that can be enhanced with our safety methodology. SystemC offers the right infrastructure by providing the concept of modules, ports, and signals that are required for our basic blocks. Unlike graphical safety tools, it also offers programmability, and repetitions can be handled by loops. Furthermore, SystemC's port check is very helpful in the development phase of the safety model, since it will complain about unbound ports at the beginning of a simulation. The failure rates are propagated by using a classical \texttt{sc\_signal}. For all blocks, we use the dynamic binding of SystemC for the sake of convenience. All blocks contain classical \texttt{SC\_METHOD}s, i.e., during the first delta cycles of the SystemC simulation, all hardware safety measures are already calculated and are printed out at the end of a simulation. The first block is the \textit{Basic Event} block shown in Listing~\ref{listing:basic_event}, which receives the failure rate (\texttt{rate}) as a constructor argument and propagates this value to its output port. \begin{listing}[!ht] \begin{minted}[ bgcolor=LightGray, fontsize=\footnotesize, linenos ]{c++} SC_MODULE(basic_event) { sc_out output; double rate; SC_HAS_PROCESS(basic_event); basic_event(sc_module_name name, double rate) : output("output"), rate(rate) { SC_METHOD(compute_fit); } void compute_fit() { output.write(rate); } }; \end{minted} \caption{Implementation of the Basic Event Block in SystemC} \label{listing:basic_event} \end{listing} % The \textit{Sum} block has a dynamic input port array and a single output port. In its computation method, it calculates the sum of the incoming failure rates on all input ports, as shown in Listing~\ref{listing:sum}. % \begin{listing}[!ht] \begin{minted}[ bgcolor=LightGray, fontsize=\footnotesize, linenos ]{c++} SC_MODULE(sum) { sc_port, 0, SC_ONE_OR_MORE_BOUND> inputs; sc_out output; SC_CTOR(sum) : output("output") { SC_METHOD(compute_fit); sensitive << inputs; } void compute_fit() { double sum = 0.0; for (int i = 0; i < inputs.size(); i++) sum += inputs[i]->read(); output.write(sum); } }; \end{minted} \caption{Implementation of the Sum Block in SystemC} \label{listing:sum} \end{listing} % The \textit{Coverage} block, shown in Listing~\ref{listing:coverage}, receives the DC as a constructor argument and calculates $\lambda_\mathrm{RF}$ (\texttt{output}) and $\lambda_\mathrm{MPF,L}$ (\texttt{latent}) according to the formulas presented in Section~\ref{sec:method}. % \begin{listing}[!ht] \begin{minted}[ bgcolor=LightGray, fontsize=\footnotesize, linenos ]{c++} SC_MODULE(coverage) { sc_in input; sc_out output; sc_port, 0, SC_ZERO_OR_MORE_BOUND> latent; double dc; SC_HAS_PROCESS(coverage); coverage(sc_module_name name, double dc) : input("input"), output("output"), dc(dc) { SC_METHOD(compute_fit); sensitive << input; } void compute_fit() { output.write(input.read() * (1 - dc)); if (latent.bind_count() != 0) latent->write(input.read() * dc); } }; \end{minted} \caption{Implementation of the Coverage Block in SystemC} \label{listing:coverage} \end{listing} % Compared to the other blocks, the implementation of the \textit{Split} block is more complex. Since we want to support dynamic binding and direct assignment of the failure distribution rate, we derived a custom \texttt{sc\_split\_port} from \texttt{sc\_port} that overwrites the \texttt{bind} methods in order to allow specifying the split rate directly with the dynamic binding, as shown in Listing~\ref{listing:port}. % \begin{listing}[!ht] \begin{minted}[ bgcolor=LightGray, fontsize=\footnotesize, linenos ]{c++} template class sc_split_out : public sc_port, 0, SC_ONE_OR_MORE_BOUND> { public: std::vector split_rates; void bind(sc_interface& interface, double rate) { sc_port_base::bind(interface); split_rates.push_back(rate); } void bind(sc_out& parent, double rate) { sc_port_base::bind(parent); split_rates.push_back(rate); } }; \end{minted} \caption{Implementation of the Custom Split Port in SystemC} \label{listing:port} \end{listing} % The actual implementation of the \textit{Split} component is shown in Listing~\ref{listing:split}. It receives a failure rate as input and distributes it to the output ports according to the assigned split rates. % \begin{listing}[!ht] \begin{minted}[ bgcolor=LightGray, fontsize=\footnotesize, linenos ]{c++} SC_MODULE(split) { sc_in input; sc_split_out outputs; SC_CTOR(split) : input("input") { SC_METHOD(compute_fit); sensitive << input; } void compute_fit() { for (int i = 0; i < outputs.size(); i++) { double rate = outputs.split_rates.at(i); outputs[i]->write(input.read() * rate); } } }; \end{minted} \caption{Implementation of the Split Block in SystemC} \label{listing:split} \end{listing} The last building block is the \textit{ASIL} block, which estimates the ASIL of the system according to Table~\ref{tab:target}. It receives the single point and residual failure rates $\lambda_\mathrm{SPF} +\lambda_\mathrm{RF}$ and the latent failure rates $\lambda_\mathrm{MPF,L}$ as input. Furthermore, it receives the total failure rate $\lambda$ as input, calculates the ASIL level, and prints it in the overridden \texttt{end\_of\_simulation()} callback function of the module as shown in Listing~\ref{listing:asil}. % \begin{listing}[!ht] \begin{minted}[ bgcolor=LightGray, fontsize=\footnotesize, linenos ]{c++} SC_MODULE(asil) { sc_in residual; sc_in latent; double spfm; double lfm; std::string system_asil; double total; SC_HAS_PROCESS(asil); asil(sc_module_name name, double total) : total(total) { SC_METHOD(compute); sensitive << residual << latent; } void compute() { spfm = 100 * (1 - (residual / total)); lfm = 100 * (1 - (latent / (total - residual))); if (spfm > 99.0 && lfm > 90.0 && residual < 10.0) system_asil = "ASIL D"; else if (spfm > 97.0 && lfm > 80.0 && residual < 100.0) system_asil = "ASIL C"; else if (spfm > 90.0 && lfm > 60.0 && residual < 100.0) system_asil = "ASIL B"; else if (residual < 1000.0) system_asil = "ASIL A"; else system_asil = "QM"; } void end_of_simulation() override { // Printout of the estimated system ASIL... } }; \end{minted} \caption{Implementation of the ASIL Block in SystemC} \label{listing:asil} \end{listing} % \newpage % \section{\new{Case Study with LPDDR5}} \label{sec:study} % \begin{figure} \centering \begin{circuitikz} \useasboundingbox (-5.5,-5.5) rectangle (5.5,5.5); \draw[blue] (0,0) node[qfpchip, num pins=16, hide numbers, no topmark, external pins width=0](C){SoC}; \draw[blue] ( 0, 4) node[qfpchip, num pins=16, hide numbers, no topmark, external pins width=0](D1){LPDDR5}; \draw[blue] ( 0,-4) node[qfpchip, num pins=16, hide numbers, no topmark, external pins width=0](D2){LPDDR5}; \draw[blue] (-4, 0) node[qfpchip, num pins=16, hide numbers, no topmark, external pins width=0](D3){LPDDR5}; \draw[blue] ( 4, 0) node[qfpchip, num pins=16, hide numbers, no topmark, external pins width=0](D4){LPDDR5}; \draw[blue] (C.bpin 16) to [multiwire=16] (D1.bpin 5); \draw[blue] (C.bpin 15) to [multiwire] (D1.bpin 6); \draw[blue] (C.bpin 14) to [multiwire] (D1.bpin 7); \draw[blue] (C.bpin 13) to [multiwire] (D1.bpin 8); \draw[blue] (C.bpin 12) to [multiwire=16] (D4.bpin 1); \draw[blue] (C.bpin 11) to [multiwire] (D4.bpin 2); \draw[blue] (C.bpin 10) to [multiwire] (D4.bpin 3); \draw[blue] (C.bpin 9) to [multiwire] (D4.bpin 4); \draw[blue] (C.bpin 8) to [multiwire=16] (D2.bpin 13); \draw[blue] (C.bpin 7) to [multiwire] (D2.bpin 14); \draw[blue] (C.bpin 6) to [multiwire] (D2.bpin 15); \draw[blue] (C.bpin 5) to [multiwire] (D2.bpin 16); \draw[blue] (C.bpin 4) to [multiwire=16] (D3.bpin 9); \draw[blue] (C.bpin 3) to [multiwire] (D3.bpin 10); \draw[blue] (C.bpin 2) to [multiwire] (D3.bpin 11); \draw[blue] (C.bpin 1) to [multiwire] (D3.bpin 12); \end{circuitikz} \caption{\new{Memory Architecture of an Automotive SoC similar to Orin~\cite{kar_22}}} \label{fig:memory_architecture} \end{figure} % \new{ In the original conference paper~\cite{uecjun_22}, we modeled the automotive LPDDR4 DRAM architecture presented in~\cite{stekra_21}. To show the scalability of our approach, in this work we model a more complex and more recent LPDDR5 memory system, which is similar to NVIDIA's Orin platform~\cite{kar_22}.} %\todo{Show scalability, show benefits of safety analysis with SystemC -> both safety and performance impact of safety measures can be analyzed with the same simulation setup} \new{Compared to its predecessor, LPDDR5 introduces a new \textit{Link Error Correction Code} (Link ECC) feature to reduce the high number of interface errors that occur due to higher data transfer rates.} \newer{Figure~\ref{fig:memory_architecture} shows the system architecture, which consists of a high-performance \textit{System on Chip} (SoC) and four LPDDR5-6400 devices. Each memory device comprises four independent 16-bit channels, which in total results in 16 channels and a 256-bit memory interface with a theoretical maximum bandwidth of \qty{204.8}{\giga\byte\per\second} for the entire control unit. To further increase data reliability, the platform is equipped with an in-line ECC mechanism. Since LPDDR5 does not offer additional metadata bits like, e.g., HBM3, and the platform does not provide a separate device such as server \textit{Dual In-line Memory Modules} (DIMMs) for redundancy (so-called side-band ECC), a small portion of the device is dedicated to storing redundant bits instead of user data. %Unlike DDR4 and DDR5 systems, where each \textit{Dual In-line Memory Module} (DIMM) has a dedicated, additional memory device to store the ECC data (so-called side-band ECC), the modeled platform does not have an additional ECC device. %For such platforms, it is common to store the ECC data in-line and cache the recently accessed ECC data in the SoC. In addition to the reduced memory capacity, the in-line ECC has a negative impact on the DRAM performance (bandwidth and latency). Redundancy has to be transmitted separately from user data, while with side-band ECC it is transmitted in parallel. In the worst case, each user data access must be accompanied by an additional redundancy access. To estimate the performance overhead of the in-line ECC technique, in Section~\ref{sec:vp}, we model the platform's DRAM subsystem within the DRAM simulator DRAMSys~\cite{junwei_15,stejun_20}. Like the safety model, DRAMSys is based on SystemC, so the system can be analyzed from both perspectives within the same simulation.} % % \subsection{\new{LPDDR5 Safety Model}} \label{sec:safety-model} \new{Figure~\ref{fig:model} shows the safety model of this architecture realized with our new methodology. Since all 16 channels are independent, in the following we only consider a single LPDDR5 channel for the safety analysis. Most of the errors originate in the DRAM array and the DRAM bus. We distinguish four different types of errors, which according to \cite{buc_20}, \cite{boe_21} and \cite{stekra_21} are the four main errors that may occur in the DRAM array: \textit{Single-Bit Errors} (SBE), \textit{Double-Bit Errors} (DBE), \textit{Multi-Bit Errors} (MBE), and \textit{Wrong Data} (WD). The exact distribution of these errors and failure rates was obtained from \textit{Scenario 1} in \cite{stekra_21}. As shown in Figure~\ref{fig:model}, these errors propagate upwards in the system to the next component, the internal LPDDR5 \textit{Single Error Correction} (SEC), which uses a $(136,128)$ Hamming ECC.} % \new{This SEC ECC is a safety mechanism that can correct all single-bit errors. Therefore, the SBEs are fully covered, reducing the residual failure rate $\lambda_\mathrm{RF}$ for SBEs to zero. This is modeled with the \textit{Coverage} block. However, if this SEC ECC safety mechanism is defective, the covered failure rate must be added to the latent SBE failure rate $\lambda_\mathrm{MPF,L}$, which propagates to the next component. Additionally, the failure rate of the SEC ECC itself must be added to the latent failure rate. Therefore, we model an additional \textit{Basic Event} called \textit{SEC ECC Broken} (SB).} \new{In case of an incoming DBE, two scenarios have to be differentiated. First, if there is a defect in the SEC engine, the DBE will stay a DBE. Second, if there is no defect in the SEC engine, it will either detect that there is an uncorrectable error or attempt to correct the data, resulting in the introduction of a third error. The probability of introducing a third error largely depends on the specific code that is used. According to~\cite{davkap_81,stekra_21}, 83\,\% of the DBEs stay DBEs, while a third error (TBE) is introduced in 17\,\% of the cases. In order to model this behavior, a \textit{Split} component is used, which distributes the incoming DBE failure rate to DBE and TBE failure rates, respectively. In the case of an incoming MBE and WD, the SEC engine is not able to correct any bit errors. Thus, these failure rates are always propagated.} \todo{Compared to LPDDR4, LPDDR5 supports higher data transfer rates at the bus interface, which, in turn, leads to higher bit error rates for the transmission between DRAM controller and device. For that reason, LPDDR5 introduces a link ECC mechanism, which uses a \textit{Single Error Correction Double Error Detection} (SECDED) code in form of a $(137,128)$ Hamming ECC. New link ECC proposed by Santiago: are all 0 and all 1 valid codewords? Important for AZ error!!!} \new{Therefore, we analyze the FITs of a typical LPDDR5 interface. According to JEDEC, the interface must fulfill at least a \textit{Bit Error Rate} (BER) of $10^{-16}$ for a single DRAM pin \todo{(CITE, gilt nur für LP4)}.} \newer{As each code word consists of 137 bits, we can compute the probability for multi-bit errors within one code word with \[ p(e) = \binom{n}{e} \cdot \mathrm{BER}^e \cdot \left(1-\mathrm{BER}\right)^{n-e},\] where $e$ is the number of errors and $n$ is the number of transmitted bits.} \new{Since the ISO 26262 requires FIT rates for the safety analysis, the probabilities have to be converted. This can be achieved by computing \[\lambda_\mathrm{Link}(e) = p(e) \cdot \mathrm{DR} \cdot n \cdot \qty{e9}{\hour},\] where DR is the data transfer rate of the memory, in our case \qty{6400}{\mega\transfer\per\second}. Table~\ref{tab:bus-errors} shows the FIT rates for SBE, DBE, and MBE, where MBE is computed as \[ \mathrm{MBE} = \sum_{e=3}^{16} \lambda_\mathrm{Link}(e).\]} \todo{It is important to highlight that the SBE rate is very large and the DBE and MBE rates can be neglected, i.e., with a BER of $10^{-16}$, it is very unlikely that a DBE or MBE will occur. Therefore, also Figure~\ref{fig:model} does not include the DBEs and MBEs of the bus. This clearly shows the necessity for a SECDED link ECC for high-speed interfaces to make sure that SBEs will be detected and corrected.} \begin{table}[t] \centering \newer{\begin{tblr}{lcc} \hline \textbf{Number of Errors ($e$)} & \textbf{$p(e)$} & \textbf{$\lambda_\mathrm{Link}(e)$} \\\hline 1 (SBE) & $1.370\cdot10^{-14}$ & $5.050\cdot10^{9}$ \\ % 1.370e-14 2 (DBE) & $9.316\cdot10^{-29}$ & $3.434\cdot10^{-5}$ \\ % 9.316e-29 3-16 (MBE) & $4.192\cdot10^{-43}$ & $1.545\cdot10^{-19}$ \\ % 4.192e-43 \hline \end{tblr}} \vspace{10pt} \caption{\new{Bus Failure Rates}} \label{tab:bus-errors} \end{table} \begin{figure} \centering \newcommand\width{10} \begin{circuitikz} \foreach \x in {0,...,63}{ \ifthenelse{\(\x<16\)\OR\(\x>31\AND\x<48\)}{ \newcommand\farbe{gray!20} }{ \ifthenelse{\x>55}{ \ifthenelse{\(\x>55\)\AND\(\x<60\)}{ \newcommand\farbe{red} }{ \newcommand\farbe{gray} } }{ \newcommand\farbe{white} } } \ifthenelse{\x=59}{ \fill[fill=red] (\x*\width*0.015625, 0.5) rectangle ++(\width*0.015625*0.5, -0.5) {} coordinate(c1); \fill[fill=gray](c1) rectangle ++(\width*0.015625*0.5, 0.5) {}; \node[fit={(\x*\width*0.015625,0)(\x*\width*0.015625+\width*0.015625,0.5)}, inner sep=0pt, draw=black] (rec\x) {}; }{ \node[fit={(\x*\width*0.015625,0)(\x*\width*0.015625+\width*0.015625,0.5)}, inner sep=0pt, draw=black, fill=\farbe] (rec\x) {}; } } \draw(rec15.south) to [open] ++(0,-0.15) coordinate(e1); \draw[red, thick](rec0.south) ++(0,-0.15) to [short, name=s1] (e1); \draw[red](s1.center) -- ++(0,-0.25) -| (rec56); \draw(rec31.south) to [open] ++(0,-0.15) coordinate(e2); \draw[red, thick](rec16.south) ++(0,-0.15) to [short, name=s2] (e2); \draw[red](s2.center) -- ++(0,-0.50) -| (rec57); \draw(rec47.south) to [open] ++(0,-0.15) coordinate(e3); \draw[red, thick](rec32.south) ++(0,-0.15) to [short, name=s3] (e3); \draw[red](s3.center) -- ++(0,-0.75) -| (rec58); \draw(rec55.south) to [open] ++(0,-0.15) coordinate(e4); \draw[red, thick](rec48.south) ++(0,-0.15) to [short, name=s4] (e4); \draw[red](s4.center) -- ++(0,-1.00) -| (rec59); \draw[red] (rec57) ++ (0,0.5) node[]{ECC}; \draw[thick] (0,-1.5) rectangle ++(\width,3); \draw(0,1.75) node[right]{DRAM Bank}; \draw(0,0.75) node[right]{DRAM Row}; \end{circuitikz} \caption{In-Line ECC in a Single DRAM Bank} \label{fig:in-line} \end{figure} \new{As mentioned before, the memory controller in our automotive platform uses an in-line ECC mechanism with redundancy stored in the same device as user data. Figure~\ref{fig:in-line} shows a typical in-line ECC mapping, where the parity bits are stored at the end of the corresponding DRAM row that they protect. The figure also shows which user data accesses are covered by which ECC accesses. Each box corresponds to a single DRAM access with the minimum burst length of 16 (\qty{256}{\bit}, i.e., \qty{32}{\byte}). Since the corresponding parity bits are stored in the same row, the additional ECC DRAM access does not result in a row miss, which is beneficial for the performance. In Section~\ref{sec:results}, we discuss the performance overhead for the best and worst case scenarios. The used ECC is a (272, 256) Hamming SECDED with 16 bits redundancy per DRAM access. % Since the redundancy is not stored in an additional chip or in additional metadata bits, the effective memory size is reduced by \qty{12.5}{\percent}. Moreover, as shown in Figure~\ref{fig:in-line}, 4.5 DRAM accesses per row are currently unused. This area could be dedicated to additional safety measures and more powerful ECC algorithms in the future.} \new{There exist further components in the model shown in Figure~\ref{fig:model}. For instance, in the DRAM-TRIM component, the redundancy of the code is removed, possibly also reducing the number of data errors. For further explanations of these components, we refer to the paper~\cite{stekra_21} and the previous conference paper~\cite{uecjun_22}}. % \subsection{\new{LPDDR5 Performance Model}} \label{sec:vp} \new{In order to estimate the overhead of the in-line ECC mechanism, we integrated our safety model into the DRAM design space exploration framework \mbox{DRAMSys~\cite{junwei_15,stejun_20}}. In DRAMSys, we model the same architecture as shown in Figure~\ref{fig:memory_architecture}. We use traffic generators to stimulate the memory system with a \textit{sequential} (best case) and a \textit{random} (worst case) access pattern. In order to generate the additional ECC requests that are required, a new module is inserted between a regular traffic generator and the DRAM subsystem. This module retains an overview of the currently fetched parity bits for all banks. When new parity bits are required, an additional ECC request is performed before the initiating request is issued to the DRAM. For each bank, the module can hold the data of four ECC requests (redundancy for one complete row) at once.} \new{Additionally, the addresses of all incoming requests are offset by an incrementing amount to accommodate for the ECC memory regions and the unused space in each DRAM row, as shown in Figure~\ref{fig:in-line}. The offset is derived from the following equations, where $R$ is the original row, $R'$ the new offset row, $C$ the original column and $C'$ the offset column: % \[ C'=\left(R\cdot 256+C\right)~\mathrm{mod}~1792 \]} \new{\[ R'=\left\lfloor\frac{R\cdot 256+C}{1792}\right\rfloor+R \]} \begin{figure}[p] \centering \include{model} \caption{\new{Safety Model of a Single LPDDR5 Channel}} \label{fig:model} \end{figure} \begin{figure}[p] \centering \include{result1} \caption{Absolute (LPDDR5)} \label{fig:absolute} \end{figure} \begin{figure}[p] \centering \include{result2} \caption{Relative (LPDDR5)} \label{fig:relative} \end{figure} \section{Experimental Results} \label{sec:results} In this section, we first discuss the results of the safety analysis and then the results of the performance analysis. % \subsection{Safety Analysis} Unlike the state of the art, which only analyzes a single DRAM failure rate, we take our analysis one step further. Through the use of SystemC, many different scenarios can be easily computed in parallel. In order to analyze the safety behavior of the provided DRAM system and the ECC safety measures, the DRAM's failure rate is swept $\lambda_\mathrm{DRAM}$ from 1\,FIT to 2500\,FIT in several simulations. For this simulation, we assume that only the DRAM-related hardware components influence the safety goal under consideration, and leave out other hardware elements on the SoC, which were considered in~\cite{stekra_21}. In practice, failure rate budgets are distributed to the involved hardware elements. In this case, as shown in Figure~\ref{fig:absolute}, the requirement for ASIL\,D ($< 10$ FIT) could be reached if the DRAM's failure rate stayed below 53\,FIT. However, if we take a look at the relative metrics shown in Figure~\ref{fig:relative}, we can see that, with a value of 81\,\%, the SPFM is far below the ASIL\,D threshold of 99\,\%. Even ASIL\,B with a SPFM threshold of 90\,\% cannot be reached. From the LFM perspective, ASIL\,B could be easily reached and even ASIL\,C could be reached for higher $\lambda_\mathrm{DRAM}$ rates. Since for any ASIL classification both the relative and absolute metrics must be fulfilled, we observe that, independent of the DRAM's failure rate $\lambda_\mathrm{DRAM}$, we cannot achieve a higher level than ASIL\,A. Thus, it does not help to improve the failure rates of the DRAM technology itself. \new{Our results for LPDDR5 are similar to the LPDDR4 results of the original conference paper. Although LPDDR5 introduces link ECC as an additional safety measure, it still cannot achieve high ASIL ratings. It is necessary to introduce more robust and holistic safety measures within the DRAM and the memory controller as well as on software level.} \todo{...Link ECC is only able to correct errors that happen on the bus. Parity bits are calculated directly before transmission and correction is done directly after transmission. Errors that are in the data before transmission (e.g., errors from the array) are just propagated and are still in the data after transmission. Original paper considered lower error rates for link and only MBE, in this paper we distinguish between SBE/DBE/MBE, LP4 has higher VDDQ and lower data rates (max 4266 vs max 6400 for LP5 and 8533 for LP5X), AZ error rate is the same and dominant, SBE of LP5 are corrected by link ECC, only small influence on final result, what about latent failures of link ECC coverage block?} This confirms the results presented by \cite{stekra_21}, \cite{buc_20} for single scenarios. They also conclude that with the current ECC safety measures, no higher rating than ASIL\,A can be achieved. Since it is unlikely that future DRAM technologies will lead to a decrease in the failure rates, it is very important to introduce further safety measures to make the DRAM system ready for future ASIL requirements. \subsection{Performance Analysis} % \new{Additional safety measures usually come at the expense of reduced performance and storage capacity. As shown in Figure~\ref{fig:in-line}, the overhead in storage is \qty{12.5}{\percent}. The impact on performance cannot simply be calculated analytically, so simulations must be carried out. %To estimate the performance impact, simulations are required. We analyze the performance, i.e., bandwidth and latency of the discussed LPDDR5 memory subsystem, with a best case and a worst case benchmark. In DRAM systems, the best case is usually estimated with a sequential access pattern, i.e., addresses are increased incrementally. For the worst case, a random access pattern is used, since each memory access results in a row miss, which lowers bandwidth and increases latency.} \new{Figure~\ref{fig:bandwdith} shows the theoretical maximum bandwidth of a single LPDDR5 channel, which is \qty{102.4}{\giga\bit\per\second}. With the sequential access pattern, a bandwidth utilization of \qty{100.45}{\giga\bit\per\second} is reached when the ECC functionality is disabled. \qty{2}{\percent} of the maximum bandwidth are lost due to refresh and the remaining page misses. \todo{Update the text with the correct percentages!} When ECC is enabled, the bandwidth drops to \qty{96.84}{\giga\bit\per\second}, which corresponds to a decrease of another \qty{3.5}{\percent}. The drop is small because with a sequential pattern all the columns within a row are accessed successively and the fetched parity bits can be fully utilized, i.e., only 4 additional ECC accesses are required for 56 user data accesses (see Figure~\ref{fig:in-line}). When the DRAM is stressed with a worst case scenario, i.e., a fully random access pattern where each data access results in a row miss, the real bandwidth utilization without ECC is \qty{47.28}{\giga\bit\per\second}, which is only \qty{46}{\percent} of the theoretical maximum bandwidth. With enabled ECC, the bandwidth drops by another \qty{14}{\percent} to \qty{33.51}{\giga\bit\per\second}. In this case, the drop is greater because each user data access requires an additional ECC access. This ECC access is at least always a row hit. When the bandwidth drop is set in direct relation to the real bandwidth utilization, it even corresponds to a decrease of \qty{29}{\percent}, i.e., for random traffic the DRAM channel loses almost one third of its performance due to the additional safety measure. } %This is due to the high row miss rate and the additional ECC memory accesses. \begin{figure}[t!] \centering \begin{tikzpicture} \begin{axis}[ ybar=1pt, bar width = 20pt, ymin=0, ymajorgrids, yminorgrids, ylabel={Avg. Bandwidth [Gbit/s]}, symbolic x coords = {Sequential, Random, MAX}, xtick=data, enlarge x limits=0.25, legend style={at={(0.5,0.95)}, anchor=north,legend columns=1}, ] % Old % \addplot % coordinates {(Sequential, 100.45)(Random, 47.28)(MAX, 102.40)}; % \addplot % coordinates {(Sequential, 96.84)(Random, 33.51)(MAX, 102.40)}; \addplot coordinates {(Sequential, 100.53)(Random, 47.27)(MAX, 102.40)}; \addplot coordinates {(Sequential, 94.58)(Random, 33.58)(MAX, 95.57)}; \legend{\small Without ECC, \small With ECC} \end{axis} \end{tikzpicture} \caption{\new{Bandwidth Comparison}} \label{fig:bandwdith} \end{figure} % % \new{Furthermore, we analyzed the impact of the ECC on latency. The Figures~\ref{fig:linear-wo-ecc}, \ref{fig:linear-w-ecc}, \ref{fig:rand-wo-ecc} and \ref{fig:rand-w-ecc} show the latency histograms for the four investigated scenarios. It can be observed that the latency is only weakly affected in the sequential case, whereas in the random case, the distribution is shifted more towards higher latencies once ECC is enabled. The average latency for the sequential access pattern is \qty{162.4}{\nano\second} without ECC and \qty{168.2}{\nano\second} when ECC is enabled (\qty{3.4}{\percent} increase), whereas for the random case the average latency is \qty{344.3}{\nano\second} without ECC and increases to \qty{487.5}{\nano\second} with ECC (\qty{29.4}{\percent} increase).} \new{Furthermore, we analyze the impact of the in-line ECC on latency. To do this, the frequency that requests are issued to the DRAM subsystem is varied from \qty{25}{\mega\hertz} in increasing steps of \qty{25}{\mega\hertz} to \qty{400}{\mega\hertz}, which is the maximum a channel with a data rate of \qty{6400}{\mega\transfer\per\second} and a burst length of 16 can theoretically handle. The Figures~\ref{fig:lat_bw:linear} and \ref{fig:lat_bw:random} plot the average response latency of all requests over the bandwidth for the four investigated scenarios. \todo{update the values also here} In the sequential case, the idle response latency is \qty{30}{\nano\second} with disabled ECC and increases only marginally when ECC is enabled (by less than \qty{0.5}{\nano\second}). At high request issue frequencies, the impact of ECC becomes more visible as the graph starts to saturate slightly earlier and the maximum response latency is higher (\qty{149}{\nano\second} compared to \qty{90}{\nano\second}). In the random case, the idle response latency without ECC is already \qty{49}{\nano\second} because the target row must always be activated first. When ECC is enabled, it increases by \qty{10}{\percent} to \qty{54}{\nano\second} because an additional ECC access is issued before each user data access. Also, the impact at high request issue frequencies is more significant compared to the sequential case. With ECC, the graph starts saturating at around \qty{150}{\mega\hertz} compared to \qty{200}{\mega\hertz} without ECC, and the maximum response latency increases from \qty{150}{\nano\second} to \qty{336}{\nano\second}. This means that the channel with in-line ECC can handle around \qty{25}{\percent} less random traffic, which is consistent with the bandwidth results in Figure~\ref{fig:bandwdith}. % at high freq. saturation starts earlier (150 vs. 200 MHz), higher max response latency %Generally, it can be observed that the latency is only weakly affected in the sequential case, whereas in the random case the distribution is shifted more towards higher latencies once ECC is enabled. %With the maximum frequency, the average latency for the sequential access pattern is \qty{158.7}{\nano\second} at \qty{99.6}{\giga\bit\per\second} without ECC and \qty{165.6}{\nano\second} at \qty{95.9}{\giga\bit\per\second} when ECC is enabled (\qty{4.3}{\percent} increase in latency), whereas for the random case the average latency is \qty{336.7}{\nano\second} at \qty{47.1}{\giga\bit\per\second} without ECC and increases to \qty{476.1}{\nano\second} at \qty{33.5}{\giga\bit\per\second} with ECC (\qty{41.4}{\percent} increase in latency). } \new{In order to establish safety, this is a reasonable performance overhead. However, since the current safety measures are not sufficient to reach levels above ASIL\,A, it is necessary to add additional safety measures or other coding techniques like \textit{Cyclic Redundancy Check} (CRC) in the future.} % \include{ecc_results} \begin{figure*}%[h!] \begin{subfigure}[b]{0.49\textwidth} \centering \begin{tikzpicture} \begin{axis}[ ylabel={\textbf{Latency [ns]}}, xlabel={\textbf{Bandwidth [Gbit/s]}}, grid=minor, width = \textwidth, height = 5.25cm, %height = 0.9\columnwidth, xmin = 0, ymin = 0, xmax = 120, ymax = 500, legend style={legend pos=north west, font=\small} %xmax = 10000 ] % Without ECC \addplot[BrickRed, thick, mark=square, line cap=round, smooth] coordinates { % Old % (6.40 , 31.02) % 25 MHz % (12.80 , 29.45) % (19.19 , 29.27) % (25.58 , 28.69) % (31.97 , 29.50) % (38.28 , 29.51) % (44.73 , 29.38) % (51.10 , 29.17) % (56.98 , 29.75) % (63.77 , 30.17) % (70.21 , 30.61) % (76.56 , 31.80) % (82.91 , 33.01) % (89.24 , 35.59) % (95.03 , 42.13) % (99.49 , 90.34) % 400 MHz % % (99.49 , 147.20) % % (99.54 , 153.18) % % (99.59 , 158.73) (7.68,30.1) (14.08,28.9) (20.48,28.6) (26.88,28.3) (33.28,28.4) (39.67,28.6) (46.07,28.6) (52.4,29.0) (58.85,28.9) (65.26,29.4) (71.67,29.6) (78.04,30.5) (84.39,31.7) (90.83,34.8) (97.2,44.7) (100.53,156.5) }; % With ECC \addplot[MidnightBlue, thick, mark=square, line cap=round, smooth] coordinates { % Old % (6.40 , 31.22) % 25 MHz % (12.80 , 29.66) % (19.19 , 29.51) % (25.58 , 28.90) % (31.97 , 29.77) % (38.28 , 29.88) % (44.73 , 29.94) % (51.10 , 30.17) % (56.98 , 31.21) % (63.77 , 32.06) % (70.21 , 33.28) % (76.56 , 35.59) % (82.86 , 39.22) % (88.81 , 42.74) % (91.90 , 60.39) % (95.86 , 149.22) % 400 MHz % % (95.86 , 160.59) % % (95.86 , 163.21) % % (95.86 , 165.62) (7.68,31.1) (14.08,30.0) (20.48,29.22) (26.88,29.4) (33.28,29.61) (39.67,29.83) (46.07,30.2) (52.4,30.86) (58.85,30.92) (65.26,32.34) (71.64,33.93) (78.01,36.71) (84.38,40.62) (90.75,51.17) (94.58,163.675) }; \addplot[Black, thick, line cap=round, smooth, dashed] coordinates { (102.4, 0) (102.4, 500) } node[below, pos=0.5, rotate=90, font=\small] {Maximum (102.4\,Gb/s)}; \legend{ Without ECC, With ECC, } \end{axis} \end{tikzpicture} \caption{Sequential} \label{fig:lat_bw:linear} \end{subfigure} \hfill \begin{subfigure}[b]{0.49\textwidth} \centering \begin{tikzpicture} \begin{axis}[ ylabel={\textbf{Latency [ns]}}, xlabel={\textbf{Bandwidth [Gbit/s]}}, grid=minor, width = \textwidth, height = 5.25cm, %height = 0.9\columnwidth, xmin = 0, ymin = 0, xmax = 120, ymax = 500 %xmax = 10000 ] % Without ECC \addplot[BrickRed, thick, mark=square, line cap=round, smooth] coordinates { % (6.40 , 48.87) % 25 MHz % (12.79 , 53.61) % (19.18 , 59.94) % (25.51 , 68.33) % (31.72 , 80.53) % (37.87 , 98.56) % (43.84 , 135.89) % (46.49 , 305.35) % 200 MHz % (46.83 , 322.82) % (46.80 , 329.26) % % (46.84 , 331.94) % % (46.87 , 333.30) % % (46.90 , 333.93) % % (46.90 , 334.75) % % (46.99 , 335.81) % (46.90 , 335.50) % 400 MHz % % (46.90 , 336.52) % % (47.05 , 336.72) (1.28,54.7) (7.68,64.5) (14.08,71.2) (20.48,77.1) (26.87,87.1) (33.25,100.6) (39.63,125.9) (45.92,205.6) % (47.18,340.7) (47.2,343.0) % (47.21,342.1) % (47.23,343.1) % (47.24,343.2) % (47.26,343.4) }; % With ECC \addplot[MidnightBlue, thick, mark=square, line cap=round, smooth] coordinates { % (6.40 , 53.88) % 25 MHz % (12.79 , 58.95) % (19.17 , 66.62) % (25.50 , 76.68) % (31.68 , 109.71) % (33.40 , 440.06) % 150 MHz % (33.22 , 460.02) % (33.18 , 469.10) % % (33.49 , 468.44) % % (33.47 , 470.61) % % (33.47 , 472.10) % % (33.47 , 473.14) % % (33.47 , 473.94) % % (33.47 , 474.58) % % (33.47 , 475.09) % (33.47 , 475.05) % 400 MHz % % (33.47 , 475.84) % % (33.47 , 476.14) (1.28,59.7) (7.68,70.22) (14.08,77.62) (20.48,84.89) (26.87,99.31) (33.24,195.91) % (33.43,487.41) % (33.47,485.62) % (33.5,485.48) % (33.52,478.95) (33.54,485.63) % (33.56,483.76) % (33.57,484.9) % (33.58,485.42) % (33.6,480.88) % (33.61,481.95) % (33.65,484.41) }; \addplot[Black, thick, line cap=round, smooth, dashed] coordinates { (102.4, 0) (102.4, 500) } node[below, pos=0.5, rotate=90, font=\small] {Maximum (102.4\,Gb/s)}; \end{axis} \end{tikzpicture} \caption{Random} \label{fig:lat_bw:random} \end{subfigure} %%%% \caption{Average Response Latency over Bandwidth for Sequential and Random Access Patterns} \label{fig:lat_bw} \end{figure*} %\section{Discussion} %\begin{description} % \item Safety mechanisms and failure propagation due to structural division are easy to model with the dedicated \textit{Coverage} block and \textit{Split} block and don't have to be expressed extensively by logical expressions. % \item Compared to the minimal cut set analysis, events related to coverage and splitting of failures are not included in the results and therefore there is no need for post-correction. % \item No need for transformation between failure rates and probabilities. % \item Automated calculation of different scenarios due to failure rate interface % \item Extension of fault injection tests by results of hardware architectural metrics % \item Integration within functional System C blocks. % \item Disadvantage: wiring must be done carefully, but System C compiler supports with connection check % \item Analytical calculation instead of simulation -> faster and comprehensible, but diagnostic coverage and split distribution must be known % \item Divide and conquer: complex failure propagation path handled by analysis component by component % \item \textit{Coverage} blocks could be placed in chain, so that LFM is supported (already covered single-point faults of preceding {Coverage} block could be considered again in subsequent {Coverage} block with focus on latent faults %\end{description} \section{Conclusion and Future Work} \label{sec:conclusion} In this paper, we presented a new methodology for modeling the safety behavior of modern hardware systems in compliance with the ISO\,26262 automotive standard. The implementation of this new methodology is provided as an open-source SystemC library and can be used to enhance legacy models with safety and quality analysis. In order to demonstrate the power of this new methodology, we modeled a state-of-the-art automotive DRAM memory architecture. Based on this model, we simulated a continuous space of failure rates of the DRAM system. We conclude that with the current safety measures, it is not possible to achieve a rating higher than ASIL\,A. \new{Furthermore, we combined the safety simulation with a functional simulation, such that the overhead of the safety measures could be estimated quickly. In fact, we see a storage overhead of \qty{12.5}{\percent} and a bandwidth overhead of \qty{3.5}{\percent} in the best case and \qty{14}{\percent} in the worst case. In the future, we will analyze new safety measures that could help reaching the goal of an ASIL\,D certification by using the presented methodology.} % \section*{Author's Contributions} All authors contributed to all parts of the paper. % \section*{Funding} This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under grant 16ME0717 (MANNHEIM-MEMTONOMY) and supported by the Fraunhofer High Performance Center for Simulation- and Software-Based Innovation. % \section*{Conflict of Interest} There is no conflict of interest. % \section*{Acknowledgements} There are no acknowledgements at this time. % \bibliography{references_JR.bib}% common bib file % \end{document}