misc: Merge branch 'release-staging-v23-0' into stable

Change-Id: Ie2012ea0ae86401181cf02de3e22401e406a18e6
2023-07-07 19:25:10 -07:00
parent e03395b386 20ee3b9762
commit 1db206b9d3
1793 changed files with 67805 additions and 18789 deletions
--- a/.git-blame-ignore-revs
+++ b/.git-blame-ignore-revs
@@ -26,3 +26,6 @@ c3bd8eb1214cbebbc92c7958b80aa06913bce3ba
 # A commit which ran Python Black on all Python files.
 # https://gem5-review.googlesource.com/c/public/gem5/+/47024
 787204c92d876dd81357b75aede52d8ef5e053d3
+
+# A commit which ran flynt all Python files.
+e73655d038cdfa68964109044e33c9a6e7d85ac9
--- a/RELEASE-NOTES.md
+++ b/RELEASE-NOTES.md
@@ -1,3 +1,121 @@
+# Version 23.0
+
+This release has approximately 500 contributions from 50 unique contributors.
+Below we highlight key gem5 features and improvements in this release.
+
+## Significant API and user-facing changes
+
+### Major renaming of CPU stats
+
+The CPU stats have been renamed.
+See <https://gem5.atlassian.net/browse/GEM5-1304> for details.
+
+Now, each stage (fetch, execute, commit) have their own stat group.
+Stats that are shared between the different CPU model (O3, Minor, Simple) now have the exact same names.
+
+**Important:** Some stat names were misleading before this change.
+With this change, stats with the same names between different CPU models have the same meaning.
+
+### `fs.py` and `se.py` deprecated
+
+These scripts have not been well supported for many gem5 releases.
+With gem5 23.0, we have officially deprecated these scripts.
+They have been moved into the `deprecated` directory, **but they will be removed in a future release.**
+As a replacement, we strongly suggest using the gem5 standard library.
+See <https://www.gem5.org/documentation/gem5-stdlib/overview> for more information.
+
+### Renaming of `DEBUG` guard into `GEM5_DEBUG`
+
+Scons no longer defines the `DEBUG` guard in debug builds, so code making using of it should use `GEM5_DEBUG` instead.
+
+### Other API changes
+
+Also, this release:
+
+- Removes deprecated namespaces. Namespace names were updated a couple of releases ago. This release removes the old names.
+- Uses `MemberEventWrapper` in favor of `EventWrapper` for instance member functions.
+- Adds an extension mechanism to `Packet` and `Request`.
+- Sets x86 CPU vendor string to "HygoneGenuine" to better support GLIBC.
+
+## New features and improvements
+
+### Large improvements to gem5 resources and gem5 resources website
+
+We now have a new web portal for the gem5 resources: <https://resources.gem5.org>
+
+This web portal will allow users to browse the resources available (e.g., disk images, kernels, workloads, binaries, simpoints, etc.) to use out-of-the-box with the gem5 standard library.
+You can filter based on architecture, resource type, and compatible gem5 versions.
+
+For each resource, there are examples of how to use the resource and pointers to examples using the resource in the gem5 codebase.
+
+More information can be found on gem5's website: <https://www.gem5.org/documentation/general_docs/gem5_resources/>
+
+We will be expanding gem5 resources with more workloads and resources over the course of the next release.
+If you would like to contribute to gem5 resources by uploading your own workloads, disk images, etc., please create an issue on GitHub.
+
+In addition to the new gem5 Resources web portal, the gem5 Resources API has been significantly updated and improved.
+There are now much simpler functions for getting resources such as `obtain_resource(<name>)` that will download the resource by name and return a reference that can be used (e.g., as a binary in `set_se_workload` function on the board).
+As such the generic `Resouce` class has been deprecated and will be removed in a future release.
+
+Resources are now specialized for their particular category.
+For example, there is now a `BinaryResource` class which will return if a user specifies a binary resource when using the `obtain_resource` function.
+This allow for resource typing and for greater resource specialization.
+
+### Arm ISA improvements
+
+Architectural support for Armv9 [Scalable Matrix extension](https://developer.arm.com/documentation/ddi0616/latest) (FEAT_SME).
+The implementation employs a simple renaming scheme for the Za array register in the O3 CPU, so that writes to difference tiles in the register are considered a dependency and are therefore serialized.
+
+The following SVE and SIMD & FP extensions have also been implemented:
+* FEAT_F64MM
+* FEAT_F32MM
+* FEAT_DOTPROD
+* FEAT_I8MM
+
+And more generally:
+
+* FEAT_TLBIOS
+* FEAT_FLAGM
+* FEAT_FLAGM2
+* FEAT_RNG
+* FEAT_RNG_TRAP
+* FEAT_EVT
+
+### Support for DRAMSys
+
+gem5 can now use DRAMSys <https://github.com/tukl-msd/DRAMSys> as a DRAM backend.
+
+### RISC-V improvements
+
+This release:
+
+- Fully implements RISC-V scalar cryptography extensions.
+- Fully implement RISC-V rv32.
+- Implements PMP lock features.
+- Adds general RISC-V improvements to provide better stability.
+
+### Standard library improvements and new components
+
+This release:
+
+- Adds MESI_Three_Level component.
+- Supports ELFies and LoopPoint analysis output from Sniper.
+- Supports DRAMSys in the stdlib.
+
+## Bugfixes and other small improvements
+
+This release also:
+
+- Removes deprecated python libraries.
+- Adds a DDR5 model.
+- Adds AMD GPU MI200/gfx90a support.
+- Changes building so it no longer "duplicates sources" in build/ which improves support for some IDEs and code analysis. If you still need to duplicate sources you can use the `--duplicate-sources` option to `scons`.
+- Enables `--debug-activate=<object name>` to use debug trace for only a single SimObject (the opposite of `--debug-ignore`). See `--debug-help` for more information.
+- Adds support to exit the simulation loop based on Arm-PMU events.
+- Supports Python 3.11.
+- Adds the idea of a CpuCluster to gem5.
+
+
 # Version 22.1.0.0

 This release has 500 contributions from 48 unique contributors and marks our second major release of 2022.
--- a/100
+++ b/100
@@ -1,6 +1,6 @@
 # -*- mode:python -*-

-# Copyright (c) 2013, 2015-2020 ARM Limited
+# Copyright (c) 2013, 2015-2020, 2023 ARM Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -145,6 +145,15 @@ AddOption('--gprof', action='store_true',
          help='Enable support for the gprof profiler')
 AddOption('--pprof', action='store_true',
          help='Enable support for the pprof profiler')
+# Default to --no-duplicate-sources, but keep --duplicate-sources to opt-out
+# of this new build behaviour in case it introduces regressions. We could use
+# action=argparse.BooleanOptionalAction here once Python 3.9 is required.
+AddOption('--duplicate-sources', action='store_true', default=False,
+          dest='duplicate_sources',
+          help='Create symlinks to sources in the build directory')
+AddOption('--no-duplicate-sources', action='store_false',
+          dest='duplicate_sources',
+          help='Do not create symlinks to sources in the build directory')

 # Inject the built_tools directory into the python path.
 sys.path[1:1] = [ Dir('#build_tools').abspath ]
@@ -168,6 +177,10 @@ SetOption('warn', 'no-duplicate-environment')

 Export('MakeAction')

+# Patch re.compile to support inline flags anywhere within a RE
+# string. Required to use PLY with Python 3.11+.
+gem5_scons.patch_re_compile_for_inline_flags()
+
 ########################################################################
 #
 # Set up the main build environment.
@@ -264,6 +277,8 @@ main.Append(CPPPATH=[Dir('ext')])

 # Add shared top-level headers
 main.Prepend(CPPPATH=Dir('include'))
+if not GetOption('duplicate_sources'):
+    main.Prepend(CPPPATH=Dir('src'))


 ########################################################################
@@ -290,6 +305,17 @@ main['CLANG'] = CXX_version and CXX_version.find('clang') >= 0
 if main['GCC'] + main['CLANG'] > 1:
    error('Two compilers enabled at once?')

+# Find the gem5 binary target architecture (usually host architecture). The
+# "Target: <target>" is consistent accross gcc and clang at the time of
+# writting this.
+bin_target_arch = readCommand([main['CXX'], '--verbose'], exception=False)
+main["BIN_TARGET_ARCH"] = (
+    "x86_64"
+    if bin_target_arch.find("Target: x86_64") != -1
+    else "aarch64"
+    if bin_target_arch.find("Target: aarch64") != -1
+    else "unknown"
+)

 ########################################################################
 #
@@ -420,6 +446,8 @@ for variant_path in variant_paths:
                    conf.CheckLinkFlag('-Wl,--threads')
                    conf.CheckLinkFlag(
                            '-Wl,--thread-count=%d' % GetOption('num_jobs'))
+
+
    else:
        error('\n'.join((
              "Don't know what compiler options to use for your compiler.",
@@ -439,10 +467,6 @@ for variant_path in variant_paths:
            error('gcc version 7 or newer required.\n'
                  'Installed version:', env['CXXVERSION'])

-        with gem5_scons.Configure(env) as conf:
-            # This warning has a false positive in the systemc in g++ 11.1.
-            conf.CheckCxxFlag('-Wno-free-nonheap-object')
-
        # Add the appropriate Link-Time Optimization (LTO) flags if
        # `--with-lto` is set.
        if GetOption('with_lto'):
@@ -464,6 +488,17 @@ for variant_path in variant_paths:
            '-fno-builtin-malloc', '-fno-builtin-calloc',
            '-fno-builtin-realloc', '-fno-builtin-free'])

+        if compareVersions(env['CXXVERSION'], "9") < 0:
+            # `libstdc++fs`` must be explicitly linked for `std::filesystem``
+            # in GCC version 8. As of GCC version 9, this is not required.
+            #
+            # In GCC 7 the `libstdc++fs`` library explicit linkage is also
+            # required but the `std::filesystem` is under the `experimental`
+            # namespace(`std::experimental::filesystem`).
+            #
+            # Note: gem5 does not support GCC versions < 7.
+            env.Append(LIBS=['stdc++fs'])
+
    elif env['CLANG']:
        if compareVersions(env['CXXVERSION'], "6") < 0:
            error('clang version 6 or newer required.\n'
@@ -481,6 +516,18 @@ for variant_path in variant_paths:

        env.Append(TCMALLOC_CCFLAGS=['-fno-builtin'])

+        if compareVersions(env['CXXVERSION'], "11") < 0:
+            # `libstdc++fs`` must be explicitly linked for `std::filesystem``
+            # in clang versions 6 through 10.
+            #
+            # In addition, for these versions, the
+            # `std::filesystem` is under the `experimental`
+            # namespace(`std::experimental::filesystem`).
+            #
+            # Note: gem5 does not support clang versions < 6.
+            env.Append(LIBS=['stdc++fs'])
+
+
        # On Mac OS X/Darwin we need to also use libc++ (part of XCode) as
        # opposed to libstdc++, as the later is dated.
        if sys.platform == "darwin":
@@ -511,7 +558,38 @@ for variant_path in variant_paths:
        if env['GCC'] or env['CLANG']:
            env.Append(CCFLAGS=['-fsanitize=%s' % sanitizers,
                                 '-fno-omit-frame-pointer'],
-                        LINKFLAGS='-fsanitize=%s' % sanitizers)
+                        LINKFLAGS=['-fsanitize=%s' % sanitizers,
+                                   '-static-libasan'])
+
+            if main["BIN_TARGET_ARCH"] == "x86_64":
+                # Sanitizers can enlarge binary size drammatically, north of
+                # 2GB.  This can prevent successful linkage due to symbol
+                # relocation outside from the 2GB region allocated by the small
+                # x86_64 code model that is enabled by default (32-bit relative
+                # offset limitation).  Switching to the medium model in x86_64
+                # enables 64-bit relative offset for large objects (>64KB by
+                # default) while sticking to 32-bit relative addressing for
+                # code and smaller objects. Note this comes at a potential
+                # performance cost so it should not be enabled in all cases.
+                # This should still be a very happy medium for
+                # non-perf-critical sanitized builds.
+                env.Append(CCFLAGS='-mcmodel=medium')
+                env.Append(LINKFLAGS='-mcmodel=medium')
+            elif main["BIN_TARGET_ARCH"] == "aarch64":
+                # aarch64 default code model is small but with different
+                # constrains than for x86_64. With aarch64, the small code
+                # model enables 4GB distance between symbols. This is
+                # sufficient for the largest ALL/gem5.debug target with all
+                # sanitizers enabled at the time of writting this. Note that
+                # the next aarch64 code model is "large" which prevents dynamic
+                # linkage so it should be avoided when possible.
+                pass
+            else:
+                warning(
+                    "Unknown code model options for your architecture. "
+                    "Linkage might fail for larger binaries "
+                    "(e.g., ALL/gem5.debug with sanitizers enabled)."
+                )
        else:
            warning("Don't know how to enable %s sanitizer(s) for your "
                    "compiler." % sanitizers)
@@ -563,9 +641,9 @@ for variant_path in variant_paths:

    if not GetOption('without_tcmalloc'):
        with gem5_scons.Configure(env) as conf:
-            if conf.CheckLib('tcmalloc'):
+            if conf.CheckLib('tcmalloc_minimal'):
                conf.env.Append(CCFLAGS=conf.env['TCMALLOC_CCFLAGS'])
-            elif conf.CheckLib('tcmalloc_minimal'):
+            elif conf.CheckLib('tcmalloc'):
                conf.env.Append(CCFLAGS=conf.env['TCMALLOC_CCFLAGS'])
            else:
                warning("You can get a 12% performance improvement by "
@@ -728,11 +806,13 @@ Build variables for {dir}:
            build_dir = os.path.relpath(root, ext_dir)
            SConscript(os.path.join(root, 'SConscript'),
                       variant_dir=os.path.join(variant_ext, build_dir),
-                       exports=exports)
+                       exports=exports,
+                       duplicate=GetOption('duplicate_sources'))

    # The src/SConscript file sets up the build rules in 'env' according
    # to the configured variables.  It returns a list of environments,
    # one for each variant build (debug, opt, etc.)
-    SConscript('src/SConscript', variant_dir=variant_path, exports=exports)
+    SConscript('src/SConscript', variant_dir=variant_path, exports=exports,
+               duplicate=GetOption('duplicate_sources'))

 atexit.register(summarize_warnings)
--- a/TESTING.md
+++ b/TESTING.md
@@ -86,10 +86,10 @@ For instance, if you want to run only with `gem5.opt`, you can use
 ./main.py run --variant opt
 ```

-Or, if you want to just run X86 tests with the `gem5.opt` binary:
+Or, if you want to just run quick tests with the `gem5.opt` binary:

 ```shell
-./main.py run --length quick --variant opt --isa X86
+./main.py run --length quick --variant opt
 ```


@@ -102,6 +102,14 @@ To view all of the available tags, use
 The output is split into tag *types* (e.g., isa, variant, length) and the
 tags for each type are listed after the type name.

+Note that when using the isa tag type, tests were traditionally sorted based
+on what compilation it required. However, as tests have switched to all be
+compiled under the ALL compilation, which includes all ISAs so one doesn't
+need to compile each one individually, using the isa tag for ISAs other than
+ALL has become a less optimal way of searching for tests.  It would instead
+be better to run subsets of tests based on their directories, as described
+above.
+
 You can specify "or" between tags within the same type by using the tag flag
 multiple times. For instance, to run everything that is tagged "opt" or "fast"
 use
@@ -112,10 +120,10 @@ use

 You can also specify "and" between different types of tags by specifying more
 than one type on the command line. For instance, this will only run tests with
-both the "X86" and "opt" tags.
+both the "ALL" and "opt" tags.

 ```shell
-./main.py run --isa X86 --variant opt
+./main.py run --isa All --variant opt
 ```

 ## Running tests in batch
--- a/build_tools/cxx_config_cc.py
+++ b/build_tools/cxx_config_cc.py
@@ -255,9 +255,7 @@ for param in sim_object._params.values():
        code('} else if (name == "${{param.name}}") {')
        code.indent()
        code("${{param.name}}.clear();")
-        code(
-            "for (auto i = values.begin(); " "ret && i != values.end(); i ++)"
-        )
+        code("for (auto i = values.begin(); ret && i != values.end(); i ++)")
        code("{")
        code.indent()
        code("${{param.ptype.cxx_type}} elem;")
--- a/build_tools/debugflaghh.py
+++ b/build_tools/debugflaghh.py
@@ -82,7 +82,6 @@ code(
 namespace gem5
 {

-GEM5_DEPRECATED_NAMESPACE(Debug, debug);
 namespace debug
 {

--- a/build_tools/enum_cc.py
+++ b/build_tools/enum_cc.py
@@ -87,7 +87,7 @@ namespace gem5
 )

 if enum.wrapper_is_struct:
-    code("const char *${wrapper_name}::${name}Strings" "[Num_${name}] =")
+    code("const char *${wrapper_name}::${name}Strings[Num_${name}] =")
 else:
    if enum.is_class:
        code(
@@ -97,8 +97,7 @@ const char *${name}Strings[static_cast<int>(${name}::Num_${name})] =
        )
    else:
        code(
-            """GEM5_DEPRECATED_NAMESPACE(Enums, enums);
-namespace enums
+            """namespace enums
 {"""
        )
        code.indent(1)
--- a/build_tools/marshal.py
+++ b/build_tools/marshal.py
@@ -48,7 +48,9 @@ interpretters, and so the exact same interpretter should be used both to run
 this script, and to read in and execute the marshalled code later.
 """

+import locale
 import marshal
+import os
 import sys
 import zlib

@@ -65,6 +67,11 @@ if len(sys.argv) < 4:
    print(f"Usage: {sys.argv[0]} CPP PY MODPATH ABSPATH", file=sys.stderr)
    sys.exit(1)

+# Set the Python's locale settings manually based on the `LC_CTYPE`
+# environment variable
+if "LC_CTYPE" in os.environ:
+    locale.setlocale(locale.LC_CTYPE, os.environ["LC_CTYPE"])
+
 _, cpp, python, modpath, abspath = sys.argv

 with open(python, "r") as f:
--- a/configs/common/CacheConfig.py
+++ b/configs/common/CacheConfig.py
@@ -60,15 +60,15 @@ def _get_hwp(hwp_option):
 def _get_cache_opts(level, options):
    opts = {}

-    size_attr = "{}_size".format(level)
+    size_attr = f"{level}_size"
    if hasattr(options, size_attr):
        opts["size"] = getattr(options, size_attr)

-    assoc_attr = "{}_assoc".format(level)
+    assoc_attr = f"{level}_assoc"
    if hasattr(options, assoc_attr):
        opts["assoc"] = getattr(options, assoc_attr)

-    prefetcher_attr = "{}_hwp_type".format(level)
+    prefetcher_attr = f"{level}_hwp_type"
    if hasattr(options, prefetcher_attr):
        opts["prefetcher"] = _get_hwp(getattr(options, prefetcher_attr))

--- a/configs/common/FileSystemConfig.py
+++ b/configs/common/FileSystemConfig.py
@@ -51,7 +51,7 @@ from shutil import rmtree, copyfile

 def hex_mask(terms):
    dec_mask = reduce(operator.or_, [2**i for i in terms], 0)
-    return "%08x" % dec_mask
+    return f"{dec_mask:08x}"


 def file_append(path, contents):
@@ -252,13 +252,13 @@ def _redirect_paths(options):
    # Redirect filesystem syscalls from src to the first matching dests
    redirect_paths = [
        RedirectPath(
-            app_path="/proc", host_paths=["%s/fs/proc" % m5.options.outdir]
+            app_path="/proc", host_paths=[f"{m5.options.outdir}/fs/proc"]
        ),
        RedirectPath(
-            app_path="/sys", host_paths=["%s/fs/sys" % m5.options.outdir]
+            app_path="/sys", host_paths=[f"{m5.options.outdir}/fs/sys"]
        ),
        RedirectPath(
-            app_path="/tmp", host_paths=["%s/fs/tmp" % m5.options.outdir]
+            app_path="/tmp", host_paths=[f"{m5.options.outdir}/fs/tmp"]
        ),
    ]

@@ -275,7 +275,7 @@ def _redirect_paths(options):
    if chroot:
        redirect_paths.append(
            RedirectPath(
-                app_path="/", host_paths=["%s" % os.path.expanduser(chroot)]
+                app_path="/", host_paths=[f"{os.path.expanduser(chroot)}"]
            )
        )

--- a/configs/common/GPUTLBConfig.py
+++ b/configs/common/GPUTLBConfig.py
@@ -204,8 +204,8 @@ def config_tlb_hierarchy(
            # add the different TLB levels to the system
            # Modify here if you want to make the TLB hierarchy a child of
            # the shader.
-            exec("system.%s = TLB_array" % system_TLB_name)
-            exec("system.%s = Coalescer_array" % system_Coalescer_name)
+            exec(f"system.{system_TLB_name} = TLB_array")
+            exec(f"system.{system_Coalescer_name} = Coalescer_array")

    # ===========================================================
    # Specify the TLB hierarchy (i.e., port connections)
--- a/configs/common/ObjectList.py
+++ b/configs/common/ObjectList.py
@@ -65,22 +65,18 @@ class ObjectList(object):
            sub_cls = self._sub_classes[real_name]
            return sub_cls
        except KeyError:
-            print(
-                "{} is not a valid sub-class of {}.".format(
-                    name, self.base_cls
-                )
-            )
+            print(f"{name} is not a valid sub-class of {self.base_cls}.")
            raise

    def print(self):
        """Print a list of available sub-classes and aliases."""

-        print("Available {} classes:".format(self.base_cls))
+        print(f"Available {self.base_cls} classes:")
        doc_wrapper = TextWrapper(
            initial_indent="\t\t", subsequent_indent="\t\t"
        )
        for name, cls in list(self._sub_classes.items()):
-            print("\t{}".format(name))
+            print(f"\t{name}")

            # Try to extract the class documentation from the class help
            # string.
@@ -92,7 +88,7 @@ class ObjectList(object):
        if self._aliases:
            print("\Aliases:")
            for alias, target in list(self._aliases.items()):
-                print("\t{} => {}".format(alias, target))
+                print(f"\t{alias} => {target}")

    def get_names(self):
        """Return a list of valid sub-class names and aliases."""
--- a/configs/common/Options.py
+++ b/configs/common/Options.py
@@ -217,7 +217,7 @@ def addNoISAOptions(parser):
        "--maxtime",
        type=float,
        default=None,
-        help="Run to the specified absolute simulated time in " "seconds",
+        help="Run to the specified absolute simulated time in seconds",
    )
    parser.add_argument(
        "-P",
@@ -691,7 +691,7 @@ def addSEOptions(parser):
        "-o",
        "--options",
        default="",
-        help="""The options to pass to the binary, use " "
+        help="""The options to pass to the binary, use
                              around the entire string""",
    )
    parser.add_argument(
@@ -834,8 +834,7 @@ def addFSOptions(parser):
        action="store",
        type=str,
        dest="benchmark",
-        help="Specify the benchmark to run. Available benchmarks: %s"
-        % DefinedBenchmarks,
+        help=f"Specify the benchmark to run. Available benchmarks: {DefinedBenchmarks}",
    )

    # Metafile options
--- a/configs/common/Simulation.py
+++ b/configs/common/Simulation.py
@@ -71,7 +71,7 @@ def setCPUClass(options):
    TmpClass, test_mem_mode = getCPUClass(options.cpu_type)
    CPUClass = None
    if TmpClass.require_caches() and not options.caches and not options.ruby:
-        fatal("%s must be used with caches" % options.cpu_type)
+        fatal(f"{options.cpu_type} must be used with caches")

    if options.checkpoint_restore != None:
        if options.restore_with_cpu != options.cpu_type:
@@ -144,7 +144,7 @@ def findCptDir(options, cptdir, testsys):
                fatal("Unable to find simpoint")
            inst += int(testsys.cpu[0].workload[0].simpoint)

-        checkpoint_dir = joinpath(cptdir, "cpt.%s.%s" % (options.bench, inst))
+        checkpoint_dir = joinpath(cptdir, f"cpt.{options.bench}.{inst}")
        if not exists(checkpoint_dir):
            fatal("Unable to find checkpoint directory %s", checkpoint_dir)

@@ -204,7 +204,7 @@ def findCptDir(options, cptdir, testsys):
            fatal("Checkpoint %d not found", cpt_num)

        cpt_starttick = int(cpts[cpt_num - 1])
-        checkpoint_dir = joinpath(cptdir, "cpt.%s" % cpts[cpt_num - 1])
+        checkpoint_dir = joinpath(cptdir, f"cpt.{cpts[cpt_num - 1]}")

    return cpt_starttick, checkpoint_dir

@@ -220,7 +220,7 @@ def scriptCheckpoints(options, maxtick, cptdir):
        print("Creating checkpoint at inst:%d" % (checkpoint_inst))
        exit_event = m5.simulate()
        exit_cause = exit_event.getCause()
-        print("exit cause = %s" % exit_cause)
+        print(f"exit cause = {exit_cause}")

        # skip checkpoint instructions should they exist
        while exit_cause == "checkpoint":
@@ -549,10 +549,10 @@ def run(options, root, testsys, cpu_class):
    if options.repeat_switch:
        switch_class = getCPUClass(options.cpu_type)[0]
        if switch_class.require_caches() and not options.caches:
-            print("%s: Must be used with caches" % str(switch_class))
+            print(f"{str(switch_class)}: Must be used with caches")
            sys.exit(1)
        if not switch_class.support_take_over():
-            print("%s: CPU switching not supported" % str(switch_class))
+            print(f"{str(switch_class)}: CPU switching not supported")
            sys.exit(1)

        repeat_switch_cpus = [
@@ -740,9 +740,9 @@ def run(options, root, testsys, cpu_class):
            )
            exit_event = m5.simulate()
        else:
-            print("Switch at curTick count:%s" % str(10000))
+            print(f"Switch at curTick count:{str(10000)}")
            exit_event = m5.simulate(10000)
-        print("Switched CPUS @ tick %s" % (m5.curTick()))
+        print(f"Switched CPUS @ tick {m5.curTick()}")

        m5.switchCpus(testsys, switch_cpu_list)

@@ -757,7 +757,7 @@ def run(options, root, testsys, cpu_class):
                exit_event = m5.simulate()
            else:
                exit_event = m5.simulate(options.standard_switch)
-            print("Switching CPUS @ tick %s" % (m5.curTick()))
+            print(f"Switching CPUS @ tick {m5.curTick()}")
            print(
                "Simulation ends instruction count:%d"
                % (testsys.switch_cpus_1[0].max_insts_any_thread)
--- a/configs/common/SysPaths.py
+++ b/configs/common/SysPaths.py
@@ -73,9 +73,7 @@ class PathSearchFunc(object):
                return next(p for p in paths if os.path.exists(p))
            except StopIteration:
                raise IOError(
-                    "Can't find file '{}' on {}.".format(
-                        filepath, self.environment_variable
-                    )
+                    f"Can't find file '{filepath}' on {self.environment_variable}."
                )


--- a/configs/common/cores/arm/HPI.py
+++ b/configs/common/cores/arm/HPI.py
@@ -1420,6 +1420,7 @@ class HPI_FloatSimdFU(MinorFU):
            "SimdMisc",
            "SimdMult",
            "SimdMultAcc",
+            "SimdMatMultAcc",
            "SimdShift",
            "SimdShiftAcc",
            "SimdSqrt",
@@ -1431,6 +1432,7 @@ class HPI_FloatSimdFU(MinorFU):
            "SimdFloatMisc",
            "SimdFloatMult",
            "SimdFloatMultAcc",
+            "SimdFloatMatMultAcc",
            "SimdFloatSqrt",
        ]
    )
--- a/configs/common/cores/arm/O3_ARM_v7a.py
+++ b/configs/common/cores/arm/O3_ARM_v7a.py
@@ -53,6 +53,7 @@ class O3_ARM_v7a_FP(FUDesc):
        OpDesc(opClass="SimdMisc", opLat=3),
        OpDesc(opClass="SimdMult", opLat=5),
        OpDesc(opClass="SimdMultAcc", opLat=5),
+        OpDesc(opClass="SimdMatMultAcc", opLat=5),
        OpDesc(opClass="SimdShift", opLat=3),
        OpDesc(opClass="SimdShiftAcc", opLat=3),
        OpDesc(opClass="SimdSqrt", opLat=9),
@@ -64,6 +65,7 @@ class O3_ARM_v7a_FP(FUDesc):
        OpDesc(opClass="SimdFloatMisc", opLat=3),
        OpDesc(opClass="SimdFloatMult", opLat=3),
        OpDesc(opClass="SimdFloatMultAcc", opLat=5),
+        OpDesc(opClass="SimdFloatMatMultAcc", opLat=5),
        OpDesc(opClass="SimdFloatSqrt", opLat=9),
        OpDesc(opClass="FloatAdd", opLat=5),
        OpDesc(opClass="FloatCmp", opLat=5),
--- a/configs/common/cores/arm/ex5_LITTLE.py
+++ b/configs/common/cores/arm/ex5_LITTLE.py
@@ -56,6 +56,7 @@ class ex5_LITTLE_FP(MinorDefaultFloatSimdFU):
        OpDesc(opClass="SimdMisc", opLat=3),
        OpDesc(opClass="SimdMult", opLat=4),
        OpDesc(opClass="SimdMultAcc", opLat=5),
+        OpDesc(opClass="SimdMatMultAcc", opLat=5),
        OpDesc(opClass="SimdShift", opLat=3),
        OpDesc(opClass="SimdShiftAcc", opLat=3),
        OpDesc(opClass="SimdSqrt", opLat=9),
@@ -67,6 +68,7 @@ class ex5_LITTLE_FP(MinorDefaultFloatSimdFU):
        OpDesc(opClass="SimdFloatMisc", opLat=6),
        OpDesc(opClass="SimdFloatMult", opLat=15),
        OpDesc(opClass="SimdFloatMultAcc", opLat=6),
+        OpDesc(opClass="SimdFloatMatMultAcc", opLat=6),
        OpDesc(opClass="SimdFloatSqrt", opLat=17),
        OpDesc(opClass="FloatAdd", opLat=8),
        OpDesc(opClass="FloatCmp", opLat=6),
--- a/configs/common/cores/arm/ex5_big.py
+++ b/configs/common/cores/arm/ex5_big.py
@@ -58,6 +58,7 @@ class ex5_big_FP(FUDesc):
        OpDesc(opClass="SimdMisc", opLat=3),
        OpDesc(opClass="SimdMult", opLat=6),
        OpDesc(opClass="SimdMultAcc", opLat=5),
+        OpDesc(opClass="SimdMatMultAcc", opLat=5),
        OpDesc(opClass="SimdShift", opLat=3),
        OpDesc(opClass="SimdShiftAcc", opLat=3),
        OpDesc(opClass="SimdSqrt", opLat=9),
@@ -69,6 +70,7 @@ class ex5_big_FP(FUDesc):
        OpDesc(opClass="SimdFloatMisc", opLat=3),
        OpDesc(opClass="SimdFloatMult", opLat=6),
        OpDesc(opClass="SimdFloatMultAcc", opLat=1),
+        OpDesc(opClass="SimdFloatMatMultAcc", opLat=1),
        OpDesc(opClass="SimdFloatSqrt", opLat=9),
        OpDesc(opClass="FloatAdd", opLat=6),
        OpDesc(opClass="FloatCmp", opLat=5),
--- a/configs/common/cpu2000.py
+++ b/configs/common/cpu2000.py
@@ -83,7 +83,7 @@ class Benchmark(object):
            self.args = []

        if not hasattr(self.__class__, "output"):
-            self.output = "%s.out" % self.name
+            self.output = f"{self.name}.out"

        if not hasattr(self.__class__, "simpoint"):
            self.simpoint = None
@@ -92,13 +92,12 @@ class Benchmark(object):
            func = getattr(self.__class__, input_set)
        except AttributeError:
            raise AttributeError(
-                "The benchmark %s does not have the %s input set"
-                % (self.name, input_set)
+                f"The benchmark {self.name} does not have the {input_set} input set"
            )

        executable = joinpath(spec_dist, "binaries", isa, os, self.binary)
        if not isfile(executable):
-            raise AttributeError("%s not found" % executable)
+            raise AttributeError(f"{executable} not found")
        self.executable = executable

        # root of tree for input & output data files
@@ -112,7 +111,7 @@ class Benchmark(object):
        self.input_set = input_set

        if not isdir(inputs_dir):
-            raise AttributeError("%s not found" % inputs_dir)
+            raise AttributeError(f"{inputs_dir} not found")

        self.inputs_dir = [inputs_dir]
        if isdir(all_dir):
@@ -121,12 +120,12 @@ class Benchmark(object):
            self.outputs_dir = outputs_dir

        if not hasattr(self.__class__, "stdin"):
-            self.stdin = joinpath(inputs_dir, "%s.in" % self.name)
+            self.stdin = joinpath(inputs_dir, f"{self.name}.in")
            if not isfile(self.stdin):
                self.stdin = None

        if not hasattr(self.__class__, "stdout"):
-            self.stdout = joinpath(outputs_dir, "%s.out" % self.name)
+            self.stdout = joinpath(outputs_dir, f"{self.name}.out")
            if not isfile(self.stdout):
                self.stdout = None

@@ -387,9 +386,9 @@ class mesa(Benchmark):
            "-frames",
            frames,
            "-meshfile",
-            "%s.in" % self.name,
+            f"{self.name}.in",
            "-ppmfile",
-            "%s.ppm" % self.name,
+            f"{self.name}.ppm",
        ]

    def test(self, isa, os):
@@ -876,34 +875,34 @@ class vortex(Benchmark):
        elif isa == "sparc" or isa == "sparc32":
            self.endian = "bendian"
        else:
-            raise AttributeError("unknown ISA %s" % isa)
+            raise AttributeError(f"unknown ISA {isa}")

        super(vortex, self).__init__(isa, os, input_set)

    def test(self, isa, os):
-        self.args = ["%s.raw" % self.endian]
+        self.args = [f"{self.endian}.raw"]
        self.output = "vortex.out"

    def train(self, isa, os):
-        self.args = ["%s.raw" % self.endian]
+        self.args = [f"{self.endian}.raw"]
        self.output = "vortex.out"

    def smred(self, isa, os):
-        self.args = ["%s.raw" % self.endian]
+        self.args = [f"{self.endian}.raw"]
        self.output = "vortex.out"

    def mdred(self, isa, os):
-        self.args = ["%s.raw" % self.endian]
+        self.args = [f"{self.endian}.raw"]
        self.output = "vortex.out"

    def lgred(self, isa, os):
-        self.args = ["%s.raw" % self.endian]
+        self.args = [f"{self.endian}.raw"]
        self.output = "vortex.out"


 class vortex1(vortex):
    def ref(self, isa, os):
-        self.args = ["%s1.raw" % self.endian]
+        self.args = [f"{self.endian}1.raw"]
        self.output = "vortex1.out"
        self.simpoint = 271 * 100e6

@@ -911,14 +910,14 @@ class vortex1(vortex):
 class vortex2(vortex):
    def ref(self, isa, os):
        self.simpoint = 1024 * 100e6
-        self.args = ["%s2.raw" % self.endian]
+        self.args = [f"{self.endian}2.raw"]
        self.output = "vortex2.out"


 class vortex3(vortex):
    def ref(self, isa, os):
        self.simpoint = 564 * 100e6
-        self.args = ["%s3.raw" % self.endian]
+        self.args = [f"{self.endian}3.raw"]
        self.output = "vortex3.out"


@@ -1031,8 +1030,8 @@ if __name__ == "__main__":

    for bench in all:
        for input_set in "ref", "test", "train":
-            print("class: %s" % bench.__name__)
+            print(f"class: {bench.__name__}")
            x = bench("x86", "linux", input_set)
-            print("%s: %s" % (x, input_set))
+            print(f"{x}: {input_set}")
            pprint(x.makeProcessArgs())
            print()
--- a/configs/deprecated/example/fs.py
+++ b/configs/deprecated/example/fs.py
@@ -0,0 +1,444 @@
+# Copyright (c) 2010-2013, 2016, 2019-2020 ARM Limited
+# Copyright (c) 2020 Barkhausen Institut
+# All rights reserved.
+#
+# The license below extends only to copyright in the software and shall
+# not be construed as granting a license to any other intellectual
+# property including but not limited to intellectual property relating
+# to a hardware implementation of the functionality of the software
+# licensed hereunder.  You may use the software subject to the license
+# terms below provided that you ensure that this notice is replicated
+# unmodified and in its entirety in all distributions of the software,
+# modified or unmodified, in source code or in binary form.
+#
+# Copyright (c) 2012-2014 Mark D. Hill and David A. Wood
+# Copyright (c) 2009-2011 Advanced Micro Devices, Inc.
+# Copyright (c) 2006-2007 The Regents of The University of Michigan
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import argparse
+import sys
+
+import m5
+from m5.defines import buildEnv
+from m5.objects import *
+from m5.util import addToPath, fatal, warn
+from m5.util.fdthelper import *
+from gem5.isas import ISA
+from gem5.runtime import get_runtime_isa
+
+addToPath("../../")
+
+from ruby import Ruby
+
+from common.FSConfig import *
+from common.SysPaths import *
+from common.Benchmarks import *
+from common import Simulation
+from common import CacheConfig
+from common import CpuConfig
+from common import MemConfig
+from common import ObjectList
+from common.Caches import *
+from common import Options
+
+
+def cmd_line_template():
+    if args.command_line and args.command_line_file:
+        print(
+            "Error: --command-line and --command-line-file are "
+            "mutually exclusive"
+        )
+        sys.exit(1)
+    if args.command_line:
+        return args.command_line
+    if args.command_line_file:
+        return open(args.command_line_file).read().strip()
+    return None
+
+
+def build_test_system(np):
+    cmdline = cmd_line_template()
+    isa = get_runtime_isa()
+    if isa == ISA.MIPS:
+        test_sys = makeLinuxMipsSystem(test_mem_mode, bm[0], cmdline=cmdline)
+    elif isa == ISA.SPARC:
+        test_sys = makeSparcSystem(test_mem_mode, bm[0], cmdline=cmdline)
+    elif isa == ISA.RISCV:
+        test_sys = makeBareMetalRiscvSystem(
+            test_mem_mode, bm[0], cmdline=cmdline
+        )
+    elif isa == ISA.X86:
+        test_sys = makeLinuxX86System(
+            test_mem_mode, np, bm[0], args.ruby, cmdline=cmdline
+        )
+    elif isa == ISA.ARM:
+        test_sys = makeArmSystem(
+            test_mem_mode,
+            args.machine_type,
+            np,
+            bm[0],
+            args.dtb_filename,
+            bare_metal=args.bare_metal,
+            cmdline=cmdline,
+            external_memory=args.external_memory_system,
+            ruby=args.ruby,
+            vio_9p=args.vio_9p,
+            bootloader=args.bootloader,
+        )
+        if args.enable_context_switch_stats_dump:
+            test_sys.enable_context_switch_stats_dump = True
+    else:
+        fatal("Incapable of building %s full system!", isa.name)
+
+    # Set the cache line size for the entire system
+    test_sys.cache_line_size = args.cacheline_size
+
+    # Create a top-level voltage domain
+    test_sys.voltage_domain = VoltageDomain(voltage=args.sys_voltage)
+
+    # Create a source clock for the system and set the clock period
+    test_sys.clk_domain = SrcClockDomain(
+        clock=args.sys_clock, voltage_domain=test_sys.voltage_domain
+    )
+
+    # Create a CPU voltage domain
+    test_sys.cpu_voltage_domain = VoltageDomain()
+
+    # Create a source clock for the CPUs and set the clock period
+    test_sys.cpu_clk_domain = SrcClockDomain(
+        clock=args.cpu_clock, voltage_domain=test_sys.cpu_voltage_domain
+    )
+
+    if buildEnv["USE_RISCV_ISA"]:
+        test_sys.workload.bootloader = args.kernel
+    elif args.kernel is not None:
+        test_sys.workload.object_file = binary(args.kernel)
+
+    if args.script is not None:
+        test_sys.readfile = args.script
+
+    test_sys.init_param = args.init_param
+
+    # For now, assign all the CPUs to the same clock domain
+    test_sys.cpu = [
+        TestCPUClass(clk_domain=test_sys.cpu_clk_domain, cpu_id=i)
+        for i in range(np)
+    ]
+
+    if args.ruby:
+        bootmem = getattr(test_sys, "_bootmem", None)
+        Ruby.create_system(
+            args, True, test_sys, test_sys.iobus, test_sys._dma_ports, bootmem
+        )
+
+        # Create a seperate clock domain for Ruby
+        test_sys.ruby.clk_domain = SrcClockDomain(
+            clock=args.ruby_clock, voltage_domain=test_sys.voltage_domain
+        )
+
+        # Connect the ruby io port to the PIO bus,
+        # assuming that there is just one such port.
+        test_sys.iobus.mem_side_ports = test_sys.ruby._io_port.in_ports
+
+        for (i, cpu) in enumerate(test_sys.cpu):
+            #
+            # Tie the cpu ports to the correct ruby system ports
+            #
+            cpu.clk_domain = test_sys.cpu_clk_domain
+            cpu.createThreads()
+            cpu.createInterruptController()
+
+            test_sys.ruby._cpu_ports[i].connectCpuPorts(cpu)
+
+    else:
+        if args.caches or args.l2cache:
+            # By default the IOCache runs at the system clock
+            test_sys.iocache = IOCache(addr_ranges=test_sys.mem_ranges)
+            test_sys.iocache.cpu_side = test_sys.iobus.mem_side_ports
+            test_sys.iocache.mem_side = test_sys.membus.cpu_side_ports
+        elif not args.external_memory_system:
+            test_sys.iobridge = Bridge(
+                delay="50ns", ranges=test_sys.mem_ranges
+            )
+            test_sys.iobridge.cpu_side_port = test_sys.iobus.mem_side_ports
+            test_sys.iobridge.mem_side_port = test_sys.membus.cpu_side_ports
+
+        # Sanity check
+        if args.simpoint_profile:
+            if not ObjectList.is_noncaching_cpu(TestCPUClass):
+                fatal("SimPoint generation should be done with atomic cpu")
+            if np > 1:
+                fatal(
+                    "SimPoint generation not supported with more than one CPUs"
+                )
+
+        for i in range(np):
+            if args.simpoint_profile:
+                test_sys.cpu[i].addSimPointProbe(args.simpoint_interval)
+            if args.checker:
+                test_sys.cpu[i].addCheckerCpu()
+            if not ObjectList.is_kvm_cpu(TestCPUClass):
+                if args.bp_type:
+                    bpClass = ObjectList.bp_list.get(args.bp_type)
+                    test_sys.cpu[i].branchPred = bpClass()
+                if args.indirect_bp_type:
+                    IndirectBPClass = ObjectList.indirect_bp_list.get(
+                        args.indirect_bp_type
+                    )
+                    test_sys.cpu[
+                        i
+                    ].branchPred.indirectBranchPred = IndirectBPClass()
+            test_sys.cpu[i].createThreads()
+
+        # If elastic tracing is enabled when not restoring from checkpoint and
+        # when not fast forwarding using the atomic cpu, then check that the
+        # TestCPUClass is DerivO3CPU or inherits from DerivO3CPU. If the check
+        # passes then attach the elastic trace probe.
+        # If restoring from checkpoint or fast forwarding, the code that does this for
+        # FutureCPUClass is in the Simulation module. If the check passes then the
+        # elastic trace probe is attached to the switch CPUs.
+        if (
+            args.elastic_trace_en
+            and args.checkpoint_restore == None
+            and not args.fast_forward
+        ):
+            CpuConfig.config_etrace(TestCPUClass, test_sys.cpu, args)
+
+        CacheConfig.config_cache(args, test_sys)
+
+        MemConfig.config_mem(args, test_sys)
+
+    if ObjectList.is_kvm_cpu(TestCPUClass) or ObjectList.is_kvm_cpu(
+        FutureClass
+    ):
+        # Assign KVM CPUs to their own event queues / threads. This
+        # has to be done after creating caches and other child objects
+        # since these mustn't inherit the CPU event queue.
+        for i, cpu in enumerate(test_sys.cpu):
+            # Child objects usually inherit the parent's event
+            # queue. Override that and use the same event queue for
+            # all devices.
+            for obj in cpu.descendants():
+                obj.eventq_index = 0
+            cpu.eventq_index = i + 1
+        test_sys.kvm_vm = KvmVM()
+
+    return test_sys
+
+
+def build_drive_system(np):
+    # driver system CPU is always simple, so is the memory
+    # Note this is an assignment of a class, not an instance.
+    DriveCPUClass = AtomicSimpleCPU
+    drive_mem_mode = "atomic"
+    DriveMemClass = SimpleMemory
+
+    cmdline = cmd_line_template()
+    if buildEnv["USE_MIPS_ISA"]:
+        drive_sys = makeLinuxMipsSystem(drive_mem_mode, bm[1], cmdline=cmdline)
+    elif buildEnv["USE_SPARC_ISA"]:
+        drive_sys = makeSparcSystem(drive_mem_mode, bm[1], cmdline=cmdline)
+    elif buildEnv["USE_X86_ISA"]:
+        drive_sys = makeLinuxX86System(
+            drive_mem_mode, np, bm[1], cmdline=cmdline
+        )
+    elif buildEnv["USE_ARM_ISA"]:
+        drive_sys = makeArmSystem(
+            drive_mem_mode,
+            args.machine_type,
+            np,
+            bm[1],
+            args.dtb_filename,
+            cmdline=cmdline,
+        )
+
+    # Create a top-level voltage domain
+    drive_sys.voltage_domain = VoltageDomain(voltage=args.sys_voltage)
+
+    # Create a source clock for the system and set the clock period
+    drive_sys.clk_domain = SrcClockDomain(
+        clock=args.sys_clock, voltage_domain=drive_sys.voltage_domain
+    )
+
+    # Create a CPU voltage domain
+    drive_sys.cpu_voltage_domain = VoltageDomain()
+
+    # Create a source clock for the CPUs and set the clock period
+    drive_sys.cpu_clk_domain = SrcClockDomain(
+        clock=args.cpu_clock, voltage_domain=drive_sys.cpu_voltage_domain
+    )
+
+    drive_sys.cpu = DriveCPUClass(
+        clk_domain=drive_sys.cpu_clk_domain, cpu_id=0
+    )
+    drive_sys.cpu.createThreads()
+    drive_sys.cpu.createInterruptController()
+    drive_sys.cpu.connectBus(drive_sys.membus)
+    if args.kernel is not None:
+        drive_sys.workload.object_file = binary(args.kernel)
+
+    if ObjectList.is_kvm_cpu(DriveCPUClass):
+        drive_sys.kvm_vm = KvmVM()
+
+    drive_sys.iobridge = Bridge(delay="50ns", ranges=drive_sys.mem_ranges)
+    drive_sys.iobridge.cpu_side_port = drive_sys.iobus.mem_side_ports
+    drive_sys.iobridge.mem_side_port = drive_sys.membus.cpu_side_ports
+
+    # Create the appropriate memory controllers and connect them to the
+    # memory bus
+    drive_sys.mem_ctrls = [
+        DriveMemClass(range=r) for r in drive_sys.mem_ranges
+    ]
+    for i in range(len(drive_sys.mem_ctrls)):
+        drive_sys.mem_ctrls[i].port = drive_sys.membus.mem_side_ports
+
+    drive_sys.init_param = args.init_param
+
+    return drive_sys
+
+
+warn(
+    "The fs.py script is deprecated. It will be removed in future releases of "
+    " gem5."
+)
+
+# Add args
+parser = argparse.ArgumentParser()
+Options.addCommonOptions(parser)
+Options.addFSOptions(parser)
+
+# Add the ruby specific and protocol specific args
+if "--ruby" in sys.argv:
+    Ruby.define_options(parser)
+
+args = parser.parse_args()
+
+# system under test can be any CPU
+(TestCPUClass, test_mem_mode, FutureClass) = Simulation.setCPUClass(args)
+
+# Match the memories with the CPUs, based on the options for the test system
+TestMemClass = Simulation.setMemClass(args)
+
+if args.benchmark:
+    try:
+        bm = Benchmarks[args.benchmark]
+    except KeyError:
+        print(f"Error benchmark {args.benchmark} has not been defined.")
+        print(f"Valid benchmarks are: {DefinedBenchmarks}")
+        sys.exit(1)
+else:
+    if args.dual:
+        bm = [
+            SysConfig(
+                disks=args.disk_image,
+                rootdev=args.root_device,
+                mem=args.mem_size,
+                os_type=args.os_type,
+            ),
+            SysConfig(
+                disks=args.disk_image,
+                rootdev=args.root_device,
+                mem=args.mem_size,
+                os_type=args.os_type,
+            ),
+        ]
+    else:
+        bm = [
+            SysConfig(
+                disks=args.disk_image,
+                rootdev=args.root_device,
+                mem=args.mem_size,
+                os_type=args.os_type,
+            )
+        ]
+
+np = args.num_cpus
+
+test_sys = build_test_system(np)
+
+if len(bm) == 2:
+    drive_sys = build_drive_system(np)
+    root = makeDualRoot(True, test_sys, drive_sys, args.etherdump)
+elif len(bm) == 1 and args.dist:
+    # This system is part of a dist-gem5 simulation
+    root = makeDistRoot(
+        test_sys,
+        args.dist_rank,
+        args.dist_size,
+        args.dist_server_name,
+        args.dist_server_port,
+        args.dist_sync_repeat,
+        args.dist_sync_start,
+        args.ethernet_linkspeed,
+        args.ethernet_linkdelay,
+        args.etherdump,
+    )
+elif len(bm) == 1:
+    root = Root(full_system=True, system=test_sys)
+else:
+    print("Error I don't know how to create more than 2 systems.")
+    sys.exit(1)
+
+if ObjectList.is_kvm_cpu(TestCPUClass) or ObjectList.is_kvm_cpu(FutureClass):
+    # Required for running kvm on multiple host cores.
+    # Uses gem5's parallel event queue feature
+    # Note: The simulator is quite picky about this number!
+    root.sim_quantum = int(1e9)  # 1 ms
+
+if args.timesync:
+    root.time_sync_enable = True
+
+if args.frame_capture:
+    VncServer.frame_capture = True
+
+if buildEnv["USE_ARM_ISA"] and not args.bare_metal and not args.dtb_filename:
+    if args.machine_type not in [
+        "VExpress_GEM5",
+        "VExpress_GEM5_V1",
+        "VExpress_GEM5_V2",
+        "VExpress_GEM5_Foundation",
+    ]:
+        warn(
+            "Can only correctly generate a dtb for VExpress_GEM5_* "
+            "platforms, unless custom hardware models have been equipped "
+            "with generation functionality."
+        )
+
+    # Generate a Device Tree
+    for sysname in ("system", "testsys", "drivesys"):
+        if hasattr(root, sysname):
+            sys = getattr(root, sysname)
+            sys.workload.dtb_filename = os.path.join(
+                m5.options.outdir, f"{sysname}.dtb"
+            )
+            sys.generateDtb(sys.workload.dtb_filename)
+
+if args.wait_gdb:
+    test_sys.workload.wait_for_remote_gdb = True
+
+Simulation.setWorkCountOptions(test_sys, args)
+Simulation.run(args, root, test_sys, FutureClass)
--- a/configs/deprecated/example/se.py
+++ b/configs/deprecated/example/se.py
@@ -0,0 +1,292 @@
+# Copyright (c) 2012-2013 ARM Limited
+# All rights reserved.
+#
+# The license below extends only to copyright in the software and shall
+# not be construed as granting a license to any other intellectual
+# property including but not limited to intellectual property relating
+# to a hardware implementation of the functionality of the software
+# licensed hereunder.  You may use the software subject to the license
+# terms below provided that you ensure that this notice is replicated
+# unmodified and in its entirety in all distributions of the software,
+# modified or unmodified, in source code or in binary form.
+#
+# Copyright (c) 2006-2008 The Regents of The University of Michigan
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Simple test script
+#
+# "m5 test.py"
+
+import argparse
+import sys
+import os
+
+import m5
+from m5.defines import buildEnv
+from m5.objects import *
+from m5.params import NULL
+from m5.util import addToPath, fatal, warn
+from gem5.isas import ISA
+from gem5.runtime import get_runtime_isa
+
+addToPath("../../")
+
+from ruby import Ruby
+
+from common import Options
+from common import Simulation
+from common import CacheConfig
+from common import CpuConfig
+from common import ObjectList
+from common import MemConfig
+from common.FileSystemConfig import config_filesystem
+from common.Caches import *
+from common.cpu2000 import *
+
+
+def get_processes(args):
+    """Interprets provided args and returns a list of processes"""
+
+    multiprocesses = []
+    inputs = []
+    outputs = []
+    errouts = []
+    pargs = []
+
+    workloads = args.cmd.split(";")
+    if args.input != "":
+        inputs = args.input.split(";")
+    if args.output != "":
+        outputs = args.output.split(";")
+    if args.errout != "":
+        errouts = args.errout.split(";")
+    if args.options != "":
+        pargs = args.options.split(";")
+
+    idx = 0
+    for wrkld in workloads:
+        process = Process(pid=100 + idx)
+        process.executable = wrkld
+        process.cwd = os.getcwd()
+        process.gid = os.getgid()
+
+        if args.env:
+            with open(args.env, "r") as f:
+                process.env = [line.rstrip() for line in f]
+
+        if len(pargs) > idx:
+            process.cmd = [wrkld] + pargs[idx].split()
+        else:
+            process.cmd = [wrkld]
+
+        if len(inputs) > idx:
+            process.input = inputs[idx]
+        if len(outputs) > idx:
+            process.output = outputs[idx]
+        if len(errouts) > idx:
+            process.errout = errouts[idx]
+
+        multiprocesses.append(process)
+        idx += 1
+
+    if args.smt:
+        assert args.cpu_type == "DerivO3CPU"
+        return multiprocesses, idx
+    else:
+        return multiprocesses, 1
+
+
+warn(
+    "The se.py script is deprecated. It will be removed in future releases of "
+    " gem5."
+)
+
+parser = argparse.ArgumentParser()
+Options.addCommonOptions(parser)
+Options.addSEOptions(parser)
+
+if "--ruby" in sys.argv:
+    Ruby.define_options(parser)
+
+args = parser.parse_args()
+
+multiprocesses = []
+numThreads = 1
+
+if args.bench:
+    apps = args.bench.split("-")
+    if len(apps) != args.num_cpus:
+        print("number of benchmarks not equal to set num_cpus!")
+        sys.exit(1)
+
+    for app in apps:
+        try:
+            if get_runtime_isa() == ISA.ARM:
+                exec(
+                    "workload = %s('arm_%s', 'linux', '%s')"
+                    % (app, args.arm_iset, args.spec_input)
+                )
+            else:
+                # TARGET_ISA has been removed, but this is missing a ], so it
+                # has incorrect syntax and wasn't being used anyway.
+                exec(
+                    "workload = %s(buildEnv['TARGET_ISA', 'linux', '%s')"
+                    % (app, args.spec_input)
+                )
+            multiprocesses.append(workload.makeProcess())
+        except:
+            print(
+                f"Unable to find workload for {get_runtime_isa().name()}: {app}",
+                file=sys.stderr,
+            )
+            sys.exit(1)
+elif args.cmd:
+    multiprocesses, numThreads = get_processes(args)
+else:
+    print("No workload specified. Exiting!\n", file=sys.stderr)
+    sys.exit(1)
+
+
+(CPUClass, test_mem_mode, FutureClass) = Simulation.setCPUClass(args)
+CPUClass.numThreads = numThreads
+
+# Check -- do not allow SMT with multiple CPUs
+if args.smt and args.num_cpus > 1:
+    fatal("You cannot use SMT with multiple CPUs!")
+
+np = args.num_cpus
+mp0_path = multiprocesses[0].executable
+system = System(
+    cpu=[CPUClass(cpu_id=i) for i in range(np)],
+    mem_mode=test_mem_mode,
+    mem_ranges=[AddrRange(args.mem_size)],
+    cache_line_size=args.cacheline_size,
+)
+
+if numThreads > 1:
+    system.multi_thread = True
+
+# Create a top-level voltage domain
+system.voltage_domain = VoltageDomain(voltage=args.sys_voltage)
+
+# Create a source clock for the system and set the clock period
+system.clk_domain = SrcClockDomain(
+    clock=args.sys_clock, voltage_domain=system.voltage_domain
+)
+
+# Create a CPU voltage domain
+system.cpu_voltage_domain = VoltageDomain()
+
+# Create a separate clock domain for the CPUs
+system.cpu_clk_domain = SrcClockDomain(
+    clock=args.cpu_clock, voltage_domain=system.cpu_voltage_domain
+)
+
+# If elastic tracing is enabled, then configure the cpu and attach the elastic
+# trace probe
+if args.elastic_trace_en:
+    CpuConfig.config_etrace(CPUClass, system.cpu, args)
+
+# All cpus belong to a common cpu_clk_domain, therefore running at a common
+# frequency.
+for cpu in system.cpu:
+    cpu.clk_domain = system.cpu_clk_domain
+
+if ObjectList.is_kvm_cpu(CPUClass) or ObjectList.is_kvm_cpu(FutureClass):
+    if buildEnv["USE_X86_ISA"]:
+        system.kvm_vm = KvmVM()
+        system.m5ops_base = 0xFFFF0000
+        for process in multiprocesses:
+            process.useArchPT = True
+            process.kvmInSE = True
+    else:
+        fatal("KvmCPU can only be used in SE mode with x86")
+
+# Sanity check
+if args.simpoint_profile:
+    if not ObjectList.is_noncaching_cpu(CPUClass):
+        fatal("SimPoint/BPProbe should be done with an atomic cpu")
+    if np > 1:
+        fatal("SimPoint generation not supported with more than one CPUs")
+
+for i in range(np):
+    if args.smt:
+        system.cpu[i].workload = multiprocesses
+    elif len(multiprocesses) == 1:
+        system.cpu[i].workload = multiprocesses[0]
+    else:
+        system.cpu[i].workload = multiprocesses[i]
+
+    if args.simpoint_profile:
+        system.cpu[i].addSimPointProbe(args.simpoint_interval)
+
+    if args.checker:
+        system.cpu[i].addCheckerCpu()
+
+    if args.bp_type:
+        bpClass = ObjectList.bp_list.get(args.bp_type)
+        system.cpu[i].branchPred = bpClass()
+
+    if args.indirect_bp_type:
+        indirectBPClass = ObjectList.indirect_bp_list.get(
+            args.indirect_bp_type
+        )
+        system.cpu[i].branchPred.indirectBranchPred = indirectBPClass()
+
+    system.cpu[i].createThreads()
+
+if args.ruby:
+    Ruby.create_system(args, False, system)
+    assert args.num_cpus == len(system.ruby._cpu_ports)
+
+    system.ruby.clk_domain = SrcClockDomain(
+        clock=args.ruby_clock, voltage_domain=system.voltage_domain
+    )
+    for i in range(np):
+        ruby_port = system.ruby._cpu_ports[i]
+
+        # Create the interrupt controller and connect its ports to Ruby
+        # Note that the interrupt controller is always present but only
+        # in x86 does it have message ports that need to be connected
+        system.cpu[i].createInterruptController()
+
+        # Connect the cpu's cache ports to Ruby
+        ruby_port.connectCpuPorts(system.cpu[i])
+else:
+    MemClass = Simulation.setMemClass(args)
+    system.membus = SystemXBar()
+    system.system_port = system.membus.cpu_side_ports
+    CacheConfig.config_cache(args, system)
+    MemConfig.config_mem(args, system)
+    config_filesystem(system, args)
+
+system.workload = SEWorkload.init_compatible(mp0_path)
+
+if args.wait_gdb:
+    system.workload.wait_for_remote_gdb = True
+
+root = Root(full_system=False, system=system)
+Simulation.run(args, root, system, FutureClass)
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -85,7 +85,7 @@ parser.add_argument(
    "--cu-per-sqc",
    type=int,
    default=4,
-    help="number of CUs" "sharing an SQC (icache, and thus icache TLB)",
+    help="number of CUssharing an SQC (icache, and thus icache TLB)",
 )
 parser.add_argument(
    "--cu-per-scalar-cache",
@@ -94,7 +94,7 @@ parser.add_argument(
    help="Number of CUs sharing a scalar cache",
 )
 parser.add_argument(
-    "--simds-per-cu", type=int, default=4, help="SIMD units" "per CU"
+    "--simds-per-cu", type=int, default=4, help="SIMD unitsper CU"
 )
 parser.add_argument(
    "--cu-per-sa",
@@ -140,13 +140,13 @@ parser.add_argument(
    "--glbmem-wr-bus-width",
    type=int,
    default=32,
-    help="VGPR to Coalescer (Global Memory) data bus width " "in bytes",
+    help="VGPR to Coalescer (Global Memory) data bus width in bytes",
 )
 parser.add_argument(
    "--glbmem-rd-bus-width",
    type=int,
    default=32,
-    help="Coalescer to VGPR (Global Memory) data bus width in " "bytes",
+    help="Coalescer to VGPR (Global Memory) data bus width in bytes",
 )
 # Currently we only support 1 local memory pipe
 parser.add_argument(
@@ -166,7 +166,7 @@ parser.add_argument(
    "--wfs-per-simd",
    type=int,
    default=10,
-    help="Number of " "WF slots per SIMD",
+    help="Number of WF slots per SIMD",
 )

 parser.add_argument(
@@ -276,12 +276,25 @@ parser.add_argument(
    help="Latency for responses from ruby to the cu.",
 )
 parser.add_argument(
-    "--TLB-prefetch", type=int, help="prefetch depth for" "TLBs"
+    "--scalar-mem-req-latency",
+    type=int,
+    default=50,
+    help="Latency for scalar requests from the cu to ruby.",
 )
+parser.add_argument(
+    "--scalar-mem-resp-latency",
+    type=int,
+    # Set to 0 as the scalar cache response path does not model
+    # response latency yet and this parameter is currently not used
+    default=0,
+    help="Latency for scalar responses from ruby to the cu.",
+)
+
+parser.add_argument("--TLB-prefetch", type=int, help="prefetch depth for TLBs")
 parser.add_argument(
    "--pf-type",
    type=str,
-    help="type of prefetch: " "PF_CU, PF_WF, PF_PHASE, PF_STRIDE",
+    help="type of prefetch: PF_CU, PF_WF, PF_PHASE, PF_STRIDE",
 )
 parser.add_argument("--pf-stride", type=int, help="set prefetch stride")
 parser.add_argument(
@@ -354,7 +367,7 @@ parser.add_argument(
    type=str,
    default="gfx801",
    choices=GfxVersion.vals,
-    help="Gfx version for gpu" "Note: gfx902 is not fully supported by ROCm",
+    help="Gfx version for gpuNote: gfx902 is not fully supported by ROCm",
 )

 Ruby.define_options(parser)
@@ -463,6 +476,8 @@ for i in range(n_cu):
            vrf_lm_bus_latency=args.vrf_lm_bus_latency,
            mem_req_latency=args.mem_req_latency,
            mem_resp_latency=args.mem_resp_latency,
+            scalar_mem_req_latency=args.scalar_mem_req_latency,
+            scalar_mem_resp_latency=args.scalar_mem_resp_latency,
            localDataStore=LdsState(
                banks=args.numLdsBanks,
                bankConflictPenalty=args.ldsBankConflictPenalty,
@@ -668,7 +683,7 @@ def find_path(base_list, rel_path, test):
        full_path = os.path.join(base, rel_path)
        if test(full_path):
            return full_path
-    fatal("%s not found in %s" % (rel_path, base_list))
+    fatal(f"{rel_path} not found in {base_list}")


 def find_file(base_list, rel_path):
@@ -702,7 +717,7 @@ else:
                "/usr/lib/x86_64-linux-gnu",
            ]
        ),
-        "HOME=%s" % os.getenv("HOME", "/"),
+        f"HOME={os.getenv('HOME', '/')}",
        # Disable the VM fault handler signal creation for dGPUs also
        # forces the use of DefaultSignals instead of driver-controlled
        # InteruptSignals throughout the runtime.  DefaultSignals poll
@@ -907,14 +922,10 @@ else:

 redirect_paths = [
    RedirectPath(
-        app_path="/proc", host_paths=["%s/fs/proc" % m5.options.outdir]
-    ),
-    RedirectPath(
-        app_path="/sys", host_paths=["%s/fs/sys" % m5.options.outdir]
-    ),
-    RedirectPath(
-        app_path="/tmp", host_paths=["%s/fs/tmp" % m5.options.outdir]
+        app_path="/proc", host_paths=[f"{m5.options.outdir}/fs/proc"]
    ),
+    RedirectPath(app_path="/sys", host_paths=[f"{m5.options.outdir}/fs/sys"]),
+    RedirectPath(app_path="/tmp", host_paths=[f"{m5.options.outdir}/fs/tmp"]),
 ]

 system.redirect_paths = redirect_paths
@@ -966,7 +977,7 @@ exit_event = m5.simulate(maxtick)
 if args.fast_forward:
    if exit_event.getCause() == "a thread reached the max instruction count":
        m5.switchCpus(system, switch_cpu_list)
-        print("Switched CPUS @ tick %s" % (m5.curTick()))
+        print(f"Switched CPUS @ tick {m5.curTick()}")
        m5.stats.reset()
        exit_event = m5.simulate(maxtick - m5.curTick())
 elif args.fast_forward_pseudo_op:
@@ -977,7 +988,7 @@ elif args.fast_forward_pseudo_op:
            print("Dumping stats...")
            m5.stats.dump()
        m5.switchCpus(system, switch_cpu_list)
-        print("Switched CPUS @ tick %s" % (m5.curTick()))
+        print(f"Switched CPUS @ tick {m5.curTick()}")
        m5.stats.reset()
        # This lets us switch back and forth without keeping a counter
        switch_cpu_list = [(x[1], x[0]) for x in switch_cpu_list]
--- a/configs/example/arm/baremetal.py
+++ b/configs/example/arm/baremetal.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2016-2017,2019-2021 ARM Limited
+# Copyright (c) 2016-2017,2019-2023 Arm Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -44,6 +44,7 @@ import m5
 from m5.util import addToPath
 from m5.objects import *
 from m5.options import *
+from gem5.simulate.exit_event import ExitEvent
 import argparse

 m5.util.addToPath("../..")
@@ -52,6 +53,7 @@ from common import SysPaths
 from common import MemConfig
 from common import ObjectList
 from common.cores.arm import HPI
+from common.cores.arm import O3_ARM_v7a

 import devices
 import workloads
@@ -63,8 +65,26 @@ cpu_types = {
    "atomic": (AtomicSimpleCPU, None, None, None),
    "minor": (MinorCPU, devices.L1I, devices.L1D, devices.L2),
    "hpi": (HPI.HPI, HPI.HPI_ICache, HPI.HPI_DCache, HPI.HPI_L2),
+    "o3": (
+        O3_ARM_v7a.O3_ARM_v7a_3,
+        O3_ARM_v7a.O3_ARM_v7a_ICache,
+        O3_ARM_v7a.O3_ARM_v7a_DCache,
+        O3_ARM_v7a.O3_ARM_v7aL2,
+    ),
 }

+pmu_control_events = {
+    "enable": ExitEvent.PERF_COUNTER_ENABLE,
+    "disable": ExitEvent.PERF_COUNTER_DISABLE,
+    "reset": ExitEvent.PERF_COUNTER_RESET,
+}
+
+pmu_interrupt_events = {
+    "interrupt": ExitEvent.PERF_COUNTER_INTERRUPT,
+}
+
+pmu_stats_events = dict(**pmu_control_events, **pmu_interrupt_events)
+

 def create_cow_image(name):
    """Helper function to create a Copy-on-Write disk image"""
@@ -77,7 +97,7 @@ def create(args):
    """Create and configure the system object."""

    if args.readfile and not os.path.isfile(args.readfile):
-        print("Error: Bootscript %s does not exist" % args.readfile)
+        print(f"Error: Bootscript {args.readfile} does not exist")
        sys.exit(1)

    object_file = args.kernel if args.kernel else ""
@@ -122,8 +142,14 @@ def create(args):

    # Add CPU clusters to the system
    system.cpu_cluster = [
-        devices.CpuCluster(
-            system, args.num_cores, args.cpu_freq, "1.0V", *cpu_types[args.cpu]
+        devices.ArmCpuCluster(
+            system,
+            args.num_cores,
+            args.cpu_freq,
+            "1.0V",
+            *cpu_types[args.cpu],
+            tarmac_gen=args.tarmac_gen,
+            tarmac_dest=args.tarmac_dest,
        )
    ]

@@ -136,34 +162,85 @@ def create(args):
    system.auto_reset_addr = True

    # Using GICv3
-    system.realview.gic.gicv4 = False
+    if hasattr(system.realview.gic, "gicv4"):
+        system.realview.gic.gicv4 = False

    system.highest_el_is_64 = True

    workload_class = workloads.workload_list.get(args.workload)
    system.workload = workload_class(object_file, system)

+    if args.with_pmu:
+        enabled_pmu_events = set(
+            (*args.pmu_dump_stats_on, *args.pmu_reset_stats_on)
+        )
+        exit_sim_on_control = bool(
+            enabled_pmu_events & set(pmu_control_events.keys())
+        )
+        exit_sim_on_interrupt = bool(
+            enabled_pmu_events & set(pmu_interrupt_events.keys())
+        )
+        for cluster in system.cpu_cluster:
+            interrupt_numbers = [args.pmu_ppi_number] * len(cluster)
+            cluster.addPMUs(
+                interrupt_numbers,
+                exit_sim_on_control=exit_sim_on_control,
+                exit_sim_on_interrupt=exit_sim_on_interrupt,
+            )
+
+    if args.exit_on_uart_eot:
+        for uart in system.realview.uart:
+            uart.end_on_eot = True
+
    return system


 def run(args):
    cptdir = m5.options.outdir
    if args.checkpoint:
-        print("Checkpoint directory: %s" % cptdir)
+        print(f"Checkpoint directory: {cptdir}")
+
+    pmu_exit_msgs = tuple(evt.value for evt in pmu_stats_events.values())
+    pmu_stats_dump_msgs = tuple(
+        pmu_stats_events[evt].value for evt in set(args.pmu_dump_stats_on)
+    )
+    pmu_stats_reset_msgs = tuple(
+        pmu_stats_events[evt].value for evt in set(args.pmu_reset_stats_on)
+    )

    while True:
        event = m5.simulate()
        exit_msg = event.getCause()
-        if exit_msg == "checkpoint":
-            print("Dropping checkpoint at tick %d" % m5.curTick())
+        if exit_msg == ExitEvent.CHECKPOINT.value:
+            print(f"Dropping checkpoint at tick {m5.curTick():d}")
            cpt_dir = os.path.join(m5.options.outdir, "cpt.%d" % m5.curTick())
            m5.checkpoint(os.path.join(cpt_dir))
            print("Checkpoint done.")
+        elif exit_msg in pmu_exit_msgs:
+            if exit_msg in pmu_stats_dump_msgs:
+                print(
+                    f"Dumping stats at tick {m5.curTick():d}, "
+                    f"due to {exit_msg}"
+                )
+                m5.stats.dump()
+            if exit_msg in pmu_stats_reset_msgs:
+                print(
+                    f"Resetting stats at tick {m5.curTick():d}, "
+                    f"due to {exit_msg}"
+                )
+                m5.stats.reset()
        else:
-            print(exit_msg, " @ ", m5.curTick())
+            print(f"{exit_msg} ({event.getCode()}) @ {m5.curTick()}")
            break

-    sys.exit(event.getCode())
+
+def arm_ppi_arg(int_num: int) -> int:
+    """Argparse argument parser for valid Arm PPI numbers."""
+    # PPIs (1056 <= int_num <= 1119) are not yet supported by gem5
+    int_num = int(int_num)
+    if 16 <= int_num <= 31:
+        return int_num
+    raise ValueError(f"{int_num} is not a valid Arm PPI number")


 def main():
@@ -230,6 +307,55 @@ def main():
    )
    parser.add_argument("--checkpoint", action="store_true")
    parser.add_argument("--restore", type=str, default=None)
+    parser.add_argument(
+        "--tarmac-gen",
+        action="store_true",
+        help="Write a Tarmac trace.",
+    )
+    parser.add_argument(
+        "--tarmac-dest",
+        choices=TarmacDump.vals,
+        default="stdoutput",
+        help="Destination for the Tarmac trace output. [Default: stdoutput]",
+    )
+    parser.add_argument(
+        "--with-pmu",
+        action="store_true",
+        help="Add a PMU to each core in the cluster.",
+    )
+    parser.add_argument(
+        "--pmu-ppi-number",
+        type=arm_ppi_arg,
+        default=23,
+        help="The number of the PPI to use to connect each PMU to its core. "
+        "Must be an integer and a valid PPI number (16 <= int_num <= 31).",
+    )
+    parser.add_argument(
+        "--pmu-dump-stats-on",
+        type=str,
+        default=[],
+        action="append",
+        choices=pmu_stats_events.keys(),
+        help="Specify the PMU events on which to dump the gem5 stats. "
+        "This option may be specified multiple times to enable multiple "
+        "PMU events.",
+    )
+    parser.add_argument(
+        "--pmu-reset-stats-on",
+        type=str,
+        default=[],
+        action="append",
+        choices=pmu_stats_events.keys(),
+        help="Specify the PMU events on which to reset the gem5 stats. "
+        "This option may be specified multiple times to enable multiple "
+        "PMU events.",
+    )
+    parser.add_argument(
+        "--exit-on-uart-eot",
+        action="store_true",
+        help="Exit simulation if any of the UARTs receive an EOT. Many "
+        "workloads signal termination by sending an EOT character.",
+    )
    parser.add_argument(
        "--dtb-gen",
        action="store_true",
@@ -242,25 +368,25 @@ def main():
        "--semi-stdin",
        type=str,
        default="stdin",
-        help="Standard input for semihosting " "(default: gem5's stdin)",
+        help="Standard input for semihosting (default: gem5's stdin)",
    )
    parser.add_argument(
        "--semi-stdout",
        type=str,
        default="stdout",
-        help="Standard output for semihosting " "(default: gem5's stdout)",
+        help="Standard output for semihosting (default: gem5's stdout)",
    )
    parser.add_argument(
        "--semi-stderr",
        type=str,
        default="stderr",
-        help="Standard error for semihosting " "(default: gem5's stderr)",
+        help="Standard error for semihosting (default: gem5's stderr)",
    )
    parser.add_argument(
        "--semi-path",
        type=str,
        default="",
-        help=("Search path for files to be loaded through " "Arm Semihosting"),
+        help=("Search path for files to be loaded through Arm Semihosting"),
    )
    parser.add_argument(
        "args",
--- a/configs/example/arm/devices.py
+++ b/configs/example/arm/devices.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2016-2017, 2019, 2021 Arm Limited
+# Copyright (c) 2016-2017, 2019, 2021-2023 Arm Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -95,7 +95,7 @@ class MemBus(SystemXBar):
    default = Self.badaddr_responder.pio


-class CpuCluster(SubSystem):
+class ArmCpuCluster(CpuCluster):
    def __init__(
        self,
        system,
@@ -106,8 +106,10 @@ class CpuCluster(SubSystem):
        l1i_type,
        l1d_type,
        l2_type,
+        tarmac_gen=False,
+        tarmac_dest=None,
    ):
-        super(CpuCluster, self).__init__()
+        super().__init__()
        self._cpu_type = cpu_type
        self._l1i_type = l1i_type
        self._l1d_type = l1d_type
@@ -120,24 +122,15 @@ class CpuCluster(SubSystem):
            clock=cpu_clock, voltage_domain=self.voltage_domain
        )

-        self.cpus = [
-            self._cpu_type(
-                cpu_id=system.numCpus() + idx, clk_domain=self.clk_domain
-            )
-            for idx in range(num_cpus)
-        ]
+        self.generate_cpus(cpu_type, num_cpus)

        for cpu in self.cpus:
-            cpu.createThreads()
-            cpu.createInterruptController()
-            cpu.socket_id = system.numCpuClusters()
-        system.addCpuCluster(self, num_cpus)
+            if tarmac_gen:
+                cpu.tracer = TarmacTracer()
+                if tarmac_dest is not None:
+                    cpu.tracer.outfile = tarmac_dest

-    def requireCaches(self):
-        return self._cpu_type.require_caches()
-
-    def memoryMode(self):
-        return self._cpu_type.memory_mode()
+        system.addCpuCluster(self)

    def addL1(self):
        for cpu in self.cpus:
@@ -154,7 +147,13 @@ class CpuCluster(SubSystem):
            cpu.connectCachedPorts(self.toL2Bus.cpu_side_ports)
        self.toL2Bus.mem_side_ports = self.l2.cpu_side

-    def addPMUs(self, ints, events=[]):
+    def addPMUs(
+        self,
+        ints,
+        events=[],
+        exit_sim_on_control=False,
+        exit_sim_on_interrupt=False,
+    ):
        """
        Instantiates 1 ArmPMU per PE. The method is accepting a list of
        interrupt numbers (ints) used by the PMU and a list of events to
@@ -166,12 +165,21 @@ class CpuCluster(SubSystem):
        :type ints: List[int]
        :param events: Additional events to be measured by the PMUs
        :type events: List[Union[ProbeEvent, SoftwareIncrement]]
+        :param exit_sim_on_control: If true, exit the sim loop when the PMU is
+            enabled, disabled, or reset.
+        :type exit_on_control: bool
+        :param exit_sim_on_interrupt: If true, exit the sim loop when the PMU
+            triggers an interrupt.
+        :type exit_on_control: bool
+
        """
        assert len(ints) == len(self.cpus)
        for cpu, pint in zip(self.cpus, ints):
            int_cls = ArmPPI if pint < 32 else ArmSPI
            for isa in cpu.isa:
                isa.pmu = ArmPMU(interrupt=int_cls(num=pint))
+                isa.pmu.exitOnPMUControl = exit_sim_on_control
+                isa.pmu.exitOnPMUInterrupt = exit_sim_on_interrupt
                isa.pmu.addArchEvents(
                    cpu=cpu,
                    itb=cpu.mmu.itb,
@@ -191,36 +199,63 @@ class CpuCluster(SubSystem):
                cpu.connectCachedPorts(bus.cpu_side_ports)


-class AtomicCluster(CpuCluster):
-    def __init__(self, system, num_cpus, cpu_clock, cpu_voltage="1.0V"):
-        cpu_config = [
-            ObjectList.cpu_list.get("AtomicSimpleCPU"),
-            None,
-            None,
-            None,
-        ]
-        super(AtomicCluster, self).__init__(
-            system, num_cpus, cpu_clock, cpu_voltage, *cpu_config
+class AtomicCluster(ArmCpuCluster):
+    def __init__(
+        self,
+        system,
+        num_cpus,
+        cpu_clock,
+        cpu_voltage="1.0V",
+        tarmac_gen=False,
+        tarmac_dest=None,
+    ):
+        super().__init__(
+            system,
+            num_cpus,
+            cpu_clock,
+            cpu_voltage,
+            cpu_type=ObjectList.cpu_list.get("AtomicSimpleCPU"),
+            l1i_type=None,
+            l1d_type=None,
+            l2_type=None,
+            tarmac_gen=tarmac_gen,
+            tarmac_dest=tarmac_dest,
        )

    def addL1(self):
        pass


-class KvmCluster(CpuCluster):
-    def __init__(self, system, num_cpus, cpu_clock, cpu_voltage="1.0V"):
-        cpu_config = [ObjectList.cpu_list.get("ArmV8KvmCPU"), None, None, None]
-        super(KvmCluster, self).__init__(
-            system, num_cpus, cpu_clock, cpu_voltage, *cpu_config
+class KvmCluster(ArmCpuCluster):
+    def __init__(
+        self,
+        system,
+        num_cpus,
+        cpu_clock,
+        cpu_voltage="1.0V",
+        tarmac_gen=False,
+        tarmac_dest=None,
+    ):
+        super().__init__(
+            system,
+            num_cpus,
+            cpu_clock,
+            cpu_voltage,
+            cpu_type=ObjectList.cpu_list.get("ArmV8KvmCPU"),
+            l1i_type=None,
+            l1d_type=None,
+            l2_type=None,
+            tarmac_gen=tarmac_gen,
+            tarmac_dest=tarmac_dest,
        )

    def addL1(self):
        pass


-class FastmodelCluster(SubSystem):
+class FastmodelCluster(CpuCluster):
    def __init__(self, system, num_cpus, cpu_clock, cpu_voltage="1.0V"):
-        super(FastmodelCluster, self).__init__()
+        super().__init__()

        # Setup GIC
        gic = system.realview.gic
@@ -285,12 +320,12 @@ class FastmodelCluster(SubSystem):
        self.cpu_hub.a2t = a2t
        self.cpu_hub.t2g = t2g

-        system.addCpuCluster(self, num_cpus)
+        system.addCpuCluster(self)

-    def requireCaches(self):
+    def require_caches(self):
        return False

-    def memoryMode(self):
+    def memory_mode(self):
        return "atomic_noncaching"

    def addL1(self):
@@ -330,7 +365,6 @@ class BaseSimpleSystem(ArmSystem):
        self.mem_ranges = self.getMemRanges(int(Addr(mem_size)))

        self._clusters = []
-        self._num_cpus = 0

    def getMemRanges(self, mem_size):
        """
@@ -357,14 +391,8 @@ class BaseSimpleSystem(ArmSystem):
    def numCpuClusters(self):
        return len(self._clusters)

-    def addCpuCluster(self, cpu_cluster, num_cpus):
-        assert cpu_cluster not in self._clusters
-        assert num_cpus > 0
+    def addCpuCluster(self, cpu_cluster):
        self._clusters.append(cpu_cluster)
-        self._num_cpus += num_cpus
-
-    def numCpus(self):
-        return self._num_cpus

    def addCaches(self, need_caches, last_cache_level):
        if not need_caches:
--- a/configs/example/arm/dist_bigLITTLE.py
+++ b/configs/example/arm/dist_bigLITTLE.py
@@ -51,7 +51,7 @@ import sw
 def addOptions(parser):
    # Options for distributed simulation (i.e. dist-gem5)
    parser.add_argument(
-        "--dist", action="store_true", help="Distributed gem5" " simulation."
+        "--dist", action="store_true", help="Distributed gem5 simulation."
    )
    parser.add_argument(
        "--is-switch",
@@ -71,14 +71,14 @@ def addOptions(parser):
        default=0,
        action="store",
        type=int,
-        help="Number of gem5 processes within the dist gem5" " run.",
+        help="Number of gem5 processes within the dist gem5 run.",
    )
    parser.add_argument(
        "--dist-server-name",
        default="127.0.0.1",
        action="store",
        type=str,
-        help="Name of the message server host\nDEFAULT:" " localhost",
+        help="Name of the message server host\nDEFAULT: localhost",
    )
    parser.add_argument(
        "--dist-server-port",
--- a/configs/example/arm/fs_bigLITTLE.py
+++ b/configs/example/arm/fs_bigLITTLE.py
@@ -79,7 +79,7 @@ def _using_pdes(root):
    return False


-class BigCluster(devices.CpuCluster):
+class BigCluster(devices.ArmCpuCluster):
    def __init__(self, system, num_cpus, cpu_clock, cpu_voltage="1.0V"):
        cpu_config = [
            ObjectList.cpu_list.get("O3_ARM_v7a_3"),
@@ -87,12 +87,10 @@ class BigCluster(devices.CpuCluster):
            devices.L1D,
            devices.L2,
        ]
-        super(BigCluster, self).__init__(
-            system, num_cpus, cpu_clock, cpu_voltage, *cpu_config
-        )
+        super().__init__(system, num_cpus, cpu_clock, cpu_voltage, *cpu_config)


-class LittleCluster(devices.CpuCluster):
+class LittleCluster(devices.ArmCpuCluster):
    def __init__(self, system, num_cpus, cpu_clock, cpu_voltage="1.0V"):
        cpu_config = [
            ObjectList.cpu_list.get("MinorCPU"),
@@ -100,9 +98,7 @@ class LittleCluster(devices.CpuCluster):
            devices.L1D,
            devices.L2,
        ]
-        super(LittleCluster, self).__init__(
-            system, num_cpus, cpu_clock, cpu_voltage, *cpu_config
-        )
+        super().__init__(system, num_cpus, cpu_clock, cpu_voltage, *cpu_config)


 class Ex5BigCluster(devices.CpuCluster):
@@ -113,9 +109,7 @@ class Ex5BigCluster(devices.CpuCluster):
            ex5_big.L1D,
            ex5_big.L2,
        ]
-        super(Ex5BigCluster, self).__init__(
-            system, num_cpus, cpu_clock, cpu_voltage, *cpu_config
-        )
+        super().__init__(system, num_cpus, cpu_clock, cpu_voltage, *cpu_config)


 class Ex5LittleCluster(devices.CpuCluster):
@@ -126,9 +120,7 @@ class Ex5LittleCluster(devices.CpuCluster):
            ex5_LITTLE.L1D,
            ex5_LITTLE.L2,
        ]
-        super(Ex5LittleCluster, self).__init__(
-            system, num_cpus, cpu_clock, cpu_voltage, *cpu_config
-        )
+        super().__init__(system, num_cpus, cpu_clock, cpu_voltage, *cpu_config)


 def createSystem(
@@ -339,10 +331,10 @@ def build(options):
        "lpj=19988480",
        "norandmaps",
        "loglevel=8",
-        "mem=%s" % options.mem_size,
-        "root=%s" % options.root,
+        f"mem={options.mem_size}",
+        f"root={options.root}",
        "rw",
-        "init=%s" % options.kernel_init,
+        f"init={options.kernel_init}",
        "vmalloc=768MB",
    ]

@@ -376,7 +368,7 @@ def build(options):
        system.bigCluster = big_model(
            system, options.big_cpus, options.big_cpu_clock
        )
-        system.mem_mode = system.bigCluster.memoryMode()
+        system.mem_mode = system.bigCluster.memory_mode()
        all_cpus += system.bigCluster.cpus

    # little cluster
@@ -384,23 +376,24 @@ def build(options):
        system.littleCluster = little_model(
            system, options.little_cpus, options.little_cpu_clock
        )
-        system.mem_mode = system.littleCluster.memoryMode()
+        system.mem_mode = system.littleCluster.memory_mode()
        all_cpus += system.littleCluster.cpus

    # Figure out the memory mode
    if (
        options.big_cpus > 0
        and options.little_cpus > 0
-        and system.bigCluster.memoryMode() != system.littleCluster.memoryMode()
+        and system.bigCluster.memory_mode()
+        != system.littleCluster.memory_mode()
    ):
        m5.util.panic("Memory mode missmatch among CPU clusters")

    # create caches
    system.addCaches(options.caches, options.last_cache_level)
    if not options.caches:
-        if options.big_cpus > 0 and system.bigCluster.requireCaches():
+        if options.big_cpus > 0 and system.bigCluster.require_caches():
            m5.util.panic("Big CPU model requires caches")
-        if options.little_cpus > 0 and system.littleCluster.requireCaches():
+        if options.little_cpus > 0 and system.littleCluster.require_caches():
            m5.util.panic("Little CPU model requires caches")

    # Create a KVM VM and do KVM-specific configuration
--- a/configs/example/arm/fs_power.py
+++ b/configs/example/arm/fs_power.py
@@ -79,7 +79,7 @@ class L2PowerOn(MathExprPowerModel):
        # Example to report l2 Cache overallAccesses
        # The estimated power is converted to Watt and will vary based
        # on the size of the cache
-        self.dyn = "{}.overallAccesses * 0.000018000".format(l2_path)
+        self.dyn = f"{l2_path}.overallAccesses * 0.000018000"
        self.st = "(voltage * 3)/10"


--- a/configs/example/arm/ruby_fs.py
+++ b/configs/example/arm/ruby_fs.py
@@ -100,7 +100,7 @@ def create(args):
    """Create and configure the system object."""

    if args.script and not os.path.isfile(args.script):
-        print("Error: Bootscript %s does not exist" % args.script)
+        print(f"Error: Bootscript {args.script} does not exist")
        sys.exit(1)

    cpu_class = cpu_types[args.cpu]
@@ -115,7 +115,7 @@ def create(args):

    # Add CPU clusters to the system
    system.cpu_cluster = [
-        devices.CpuCluster(
+        devices.ArmCpuCluster(
            system,
            args.num_cpus,
            args.cpu_freq,
@@ -171,11 +171,11 @@ def create(args):
        # memory layout.
        "norandmaps",
        # Tell Linux where to find the root disk image.
-        "root=%s" % args.root_device,
+        f"root={args.root_device}",
        # Mount the root disk read-write by default.
        "rw",
        # Tell Linux about the amount of physical memory present.
-        "mem=%s" % args.mem_size,
+        f"mem={args.mem_size}",
    ]
    system.workload.command_line = " ".join(kernel_cmd)

@@ -185,7 +185,7 @@ def create(args):
 def run(args):
    cptdir = m5.options.outdir
    if args.checkpoint:
-        print("Checkpoint directory: %s" % cptdir)
+        print(f"Checkpoint directory: {cptdir}")

    while True:
        event = m5.simulate()
@@ -221,9 +221,7 @@ def main():
        "--root-device",
        type=str,
        default=default_root_device,
-        help="OS device name for root partition (default: {})".format(
-            default_root_device
-        ),
+        help=f"OS device name for root partition (default: {default_root_device})",
    )
    parser.add_argument(
        "--script", type=str, default="", help="Linux bootscript"
--- a/configs/example/arm/starter_fs.py
+++ b/configs/example/arm/starter_fs.py
@@ -88,7 +88,7 @@ def create(args):
    """Create and configure the system object."""

    if args.script and not os.path.isfile(args.script):
-        print("Error: Bootscript %s does not exist" % args.script)
+        print(f"Error: Bootscript {args.script} does not exist")
        sys.exit(1)

    cpu_class = cpu_types[args.cpu][0]
@@ -128,8 +128,14 @@ def create(args):

    # Add CPU clusters to the system
    system.cpu_cluster = [
-        devices.CpuCluster(
-            system, args.num_cores, args.cpu_freq, "1.0V", *cpu_types[args.cpu]
+        devices.ArmCpuCluster(
+            system,
+            args.num_cores,
+            args.cpu_freq,
+            "1.0V",
+            *cpu_types[args.cpu],
+            tarmac_gen=args.tarmac_gen,
+            tarmac_dest=args.tarmac_dest,
        )
    ]

@@ -163,21 +169,26 @@ def create(args):
        # memory layout.
        "norandmaps",
        # Tell Linux where to find the root disk image.
-        "root=%s" % args.root_device,
+        f"root={args.root_device}",
        # Mount the root disk read-write by default.
        "rw",
        # Tell Linux about the amount of physical memory present.
-        "mem=%s" % args.mem_size,
+        f"mem={args.mem_size}",
    ]
    system.workload.command_line = " ".join(kernel_cmd)

+    if args.with_pmu:
+        for cluster in system.cpu_cluster:
+            interrupt_numbers = [args.pmu_ppi_number] * len(cluster)
+            cluster.addPMUs(interrupt_numbers)
+
    return system


 def run(args):
    cptdir = m5.options.outdir
    if args.checkpoint:
-        print("Checkpoint directory: %s" % cptdir)
+        print(f"Checkpoint directory: {cptdir}")

    while True:
        event = m5.simulate()
@@ -188,10 +199,17 @@ def run(args):
            m5.checkpoint(os.path.join(cpt_dir))
            print("Checkpoint done.")
        else:
-            print(exit_msg, " @ ", m5.curTick())
+            print(f"{exit_msg} ({event.getCode()}) @ {m5.curTick()}")
            break

-    sys.exit(event.getCode())
+
+def arm_ppi_arg(int_num: int) -> int:
+    """Argparse argument parser for valid Arm PPI numbers."""
+    # PPIs (1056 <= int_num <= 1119) are not yet supported by gem5
+    int_num = int(int_num)
+    if 16 <= int_num <= 31:
+        return int_num
+    raise ValueError(f"{int_num} is not a valid Arm PPI number")


 def main():
@@ -219,9 +237,7 @@ def main():
        "--root-device",
        type=str,
        default=default_root_device,
-        help="OS device name for root partition (default: {})".format(
-            default_root_device
-        ),
+        help=f"OS device name for root partition (default: {default_root_device})",
    )
    parser.add_argument(
        "--script", type=str, default="", help="Linux bootscript"
@@ -259,6 +275,29 @@ def main():
        default="2GB",
        help="Specify the physical memory size",
    )
+    parser.add_argument(
+        "--tarmac-gen",
+        action="store_true",
+        help="Write a Tarmac trace.",
+    )
+    parser.add_argument(
+        "--tarmac-dest",
+        choices=TarmacDump.vals,
+        default="stdoutput",
+        help="Destination for the Tarmac trace output. [Default: stdoutput]",
+    )
+    parser.add_argument(
+        "--with-pmu",
+        action="store_true",
+        help="Add a PMU to each core in the cluster.",
+    )
+    parser.add_argument(
+        "--pmu-ppi-number",
+        type=arm_ppi_arg,
+        default=23,
+        help="The number of the PPI to use to connect each PMU to its core. "
+        "Must be an integer and a valid PPI number (16 <= int_num <= 31).",
+    )
    parser.add_argument("--checkpoint", action="store_true")
    parser.add_argument("--restore", type=str, default=None)

--- a/configs/example/arm/starter_se.py
+++ b/configs/example/arm/starter_se.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2016-2017 ARM Limited
+# Copyright (c) 2016-2017, 2022-2023 Arm Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -95,30 +95,36 @@ class SimpleSeSystem(System):

        # Add CPUs to the system. A cluster of CPUs typically have
        # private L1 caches and a shared L2 cache.
-        self.cpu_cluster = devices.CpuCluster(
-            self, args.num_cores, args.cpu_freq, "1.2V", *cpu_types[args.cpu]
+        self.cpu_cluster = devices.ArmCpuCluster(
+            self,
+            args.num_cores,
+            args.cpu_freq,
+            "1.2V",
+            *cpu_types[args.cpu],
+            tarmac_gen=args.tarmac_gen,
+            tarmac_dest=args.tarmac_dest,
        )

        # Create a cache hierarchy (unless we are simulating a
        # functional CPU in atomic memory mode) for the CPU cluster
        # and connect it to the shared memory bus.
-        if self.cpu_cluster.memoryMode() == "timing":
+        if self.cpu_cluster.memory_mode() == "timing":
            self.cpu_cluster.addL1()
            self.cpu_cluster.addL2(self.cpu_cluster.clk_domain)
        self.cpu_cluster.connectMemSide(self.membus)

        # Tell gem5 about the memory mode used by the CPUs we are
        # simulating.
-        self.mem_mode = self.cpu_cluster.memoryMode()
+        self.mem_mode = self.cpu_cluster.memory_mode()

    def numCpuClusters(self):
        return len(self._clusters)

-    def addCpuCluster(self, cpu_cluster, num_cpus):
+    def addCpuCluster(self, cpu_cluster):
        assert cpu_cluster not in self._clusters
-        assert num_cpus > 0
+        assert len(cpu_cluster) > 0
        self._clusters.append(cpu_cluster)
-        self._num_cpus += num_cpus
+        self._num_cpus += len(cpu_cluster)

    def numCpus(self):
        return self._num_cpus
@@ -215,6 +221,17 @@ def main():
        default="2GB",
        help="Specify the physical memory size",
    )
+    parser.add_argument(
+        "--tarmac-gen",
+        action="store_true",
+        help="Write a Tarmac trace.",
+    )
+    parser.add_argument(
+        "--tarmac-dest",
+        choices=TarmacDump.vals,
+        default="stdoutput",
+        help="Destination for the Tarmac trace output. [Default: stdoutput]",
+    )

    args = parser.parse_args()

@@ -240,8 +257,7 @@ def main():
    # Print the reason for the simulation exit. Some exit codes are
    # requests for service (e.g., checkpoints) from the simulation
    # script. We'll just ignore them here and exit.
-    print(event.getCause(), " @ ", m5.curTick())
-    sys.exit(event.getCode())
+    print(f"{event.getCause()} ({event.getCode()}) @ {m5.curTick()}")


 if __name__ == "__m5_main__":
--- a/configs/example/dramsys.py
+++ b/configs/example/dramsys.py
@@ -0,0 +1,63 @@
+# Copyright (c) 2022 Fraunhofer IESE
+# All rights reserved
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import m5
+
+from m5.objects import *
+
+traffic_gen = PyTrafficGen()
+system = System()
+vd = VoltageDomain(voltage="1V")
+
+system.mem_mode = "timing"
+
+system.cpu = traffic_gen
+
+dramsys = DRAMSys(
+    configuration="ext/dramsys/DRAMSys/DRAMSys/"
+    "library/resources/simulations/ddr4-example.json",
+    resource_directory="ext/dramsys/DRAMSys/DRAMSys/library/resources",
+)
+
+system.target = dramsys
+system.transactor = Gem5ToTlmBridge32()
+system.clk_domain = SrcClockDomain(clock="1.5GHz", voltage_domain=vd)
+
+# Connect everything:
+system.transactor.gem5 = system.cpu.port
+system.transactor.tlm = system.target.tlm
+
+kernel = SystemC_Kernel(system=system)
+root = Root(full_system=False, systemc_kernel=kernel)
+
+m5.instantiate()
+idle = traffic_gen.createIdle(100000)
+linear = traffic_gen.createLinear(10000000, 0, 16777216, 64, 500, 1500, 65, 0)
+random = traffic_gen.createRandom(10000000, 0, 16777216, 64, 500, 1500, 65, 0)
+traffic_gen.start([linear, idle, random])
+
+cause = m5.simulate(20000000).getCause()
+print(cause)
--- a/configs/example/fs.py
+++ b/configs/example/fs.py
@@ -1,19 +1,4 @@
-# Copyright (c) 2010-2013, 2016, 2019-2020 ARM Limited
-# Copyright (c) 2020 Barkhausen Institut
-# All rights reserved.
-#
-# The license below extends only to copyright in the software and shall
-# not be construed as granting a license to any other intellectual
-# property including but not limited to intellectual property relating
-# to a hardware implementation of the functionality of the software
-# licensed hereunder.  You may use the software subject to the license
-# terms below provided that you ensure that this notice is replicated
-# unmodified and in its entirety in all distributions of the software,
-# modified or unmodified, in source code or in binary form.
-#
-# Copyright (c) 2012-2014 Mark D. Hill and David A. Wood
-# Copyright (c) 2009-2011 Advanced Micro Devices, Inc.
-# Copyright (c) 2006-2007 The Regents of The University of Michigan
+# Copyright (c) 2023 The Regents of the University of California
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
@@ -39,401 +24,10 @@
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

-import argparse
-import sys
+from m5.util import fatal

-import m5
-from m5.defines import buildEnv
-from m5.objects import *
-from m5.util import addToPath, fatal, warn
-from m5.util.fdthelper import *
-from gem5.isas import ISA
-from gem5.runtime import get_runtime_isa
-
-addToPath("../")
-
-from ruby import Ruby
-
-from common.FSConfig import *
-from common.SysPaths import *
-from common.Benchmarks import *
-from common import Simulation
-from common import CacheConfig
-from common import CpuConfig
-from common import MemConfig
-from common import ObjectList
-from common.Caches import *
-from common import Options
-
-
-def cmd_line_template():
-    if args.command_line and args.command_line_file:
-        print(
-            "Error: --command-line and --command-line-file are "
-            "mutually exclusive"
-        )
-        sys.exit(1)
-    if args.command_line:
-        return args.command_line
-    if args.command_line_file:
-        return open(args.command_line_file).read().strip()
-    return None
-
-
-def build_test_system(np):
-    cmdline = cmd_line_template()
-    isa = get_runtime_isa()
-    if isa == ISA.MIPS:
-        test_sys = makeLinuxMipsSystem(test_mem_mode, bm[0], cmdline=cmdline)
-    elif isa == ISA.SPARC:
-        test_sys = makeSparcSystem(test_mem_mode, bm[0], cmdline=cmdline)
-    elif isa == ISA.RISCV:
-        test_sys = makeBareMetalRiscvSystem(
-            test_mem_mode, bm[0], cmdline=cmdline
-        )
-    elif isa == ISA.X86:
-        test_sys = makeLinuxX86System(
-            test_mem_mode, np, bm[0], args.ruby, cmdline=cmdline
-        )
-    elif isa == ISA.ARM:
-        test_sys = makeArmSystem(
-            test_mem_mode,
-            args.machine_type,
-            np,
-            bm[0],
-            args.dtb_filename,
-            bare_metal=args.bare_metal,
-            cmdline=cmdline,
-            external_memory=args.external_memory_system,
-            ruby=args.ruby,
-            vio_9p=args.vio_9p,
-            bootloader=args.bootloader,
-        )
-        if args.enable_context_switch_stats_dump:
-            test_sys.enable_context_switch_stats_dump = True
-    else:
-        fatal("Incapable of building %s full system!", isa.name)
-
-    # Set the cache line size for the entire system
-    test_sys.cache_line_size = args.cacheline_size
-
-    # Create a top-level voltage domain
-    test_sys.voltage_domain = VoltageDomain(voltage=args.sys_voltage)
-
-    # Create a source clock for the system and set the clock period
-    test_sys.clk_domain = SrcClockDomain(
-        clock=args.sys_clock, voltage_domain=test_sys.voltage_domain
-    )
-
-    # Create a CPU voltage domain
-    test_sys.cpu_voltage_domain = VoltageDomain()
-
-    # Create a source clock for the CPUs and set the clock period
-    test_sys.cpu_clk_domain = SrcClockDomain(
-        clock=args.cpu_clock, voltage_domain=test_sys.cpu_voltage_domain
-    )
-
-    if buildEnv["USE_RISCV_ISA"]:
-        test_sys.workload.bootloader = args.kernel
-    elif args.kernel is not None:
-        test_sys.workload.object_file = binary(args.kernel)
-
-    if args.script is not None:
-        test_sys.readfile = args.script
-
-    test_sys.init_param = args.init_param
-
-    # For now, assign all the CPUs to the same clock domain
-    test_sys.cpu = [
-        TestCPUClass(clk_domain=test_sys.cpu_clk_domain, cpu_id=i)
-        for i in range(np)
-    ]
-
-    if args.ruby:
-        bootmem = getattr(test_sys, "_bootmem", None)
-        Ruby.create_system(
-            args, True, test_sys, test_sys.iobus, test_sys._dma_ports, bootmem
-        )
-
-        # Create a seperate clock domain for Ruby
-        test_sys.ruby.clk_domain = SrcClockDomain(
-            clock=args.ruby_clock, voltage_domain=test_sys.voltage_domain
-        )
-
-        # Connect the ruby io port to the PIO bus,
-        # assuming that there is just one such port.
-        test_sys.iobus.mem_side_ports = test_sys.ruby._io_port.in_ports
-
-        for (i, cpu) in enumerate(test_sys.cpu):
-            #
-            # Tie the cpu ports to the correct ruby system ports
-            #
-            cpu.clk_domain = test_sys.cpu_clk_domain
-            cpu.createThreads()
-            cpu.createInterruptController()
-
-            test_sys.ruby._cpu_ports[i].connectCpuPorts(cpu)
-
-    else:
-        if args.caches or args.l2cache:
-            # By default the IOCache runs at the system clock
-            test_sys.iocache = IOCache(addr_ranges=test_sys.mem_ranges)
-            test_sys.iocache.cpu_side = test_sys.iobus.mem_side_ports
-            test_sys.iocache.mem_side = test_sys.membus.cpu_side_ports
-        elif not args.external_memory_system:
-            test_sys.iobridge = Bridge(
-                delay="50ns", ranges=test_sys.mem_ranges
-            )
-            test_sys.iobridge.cpu_side_port = test_sys.iobus.mem_side_ports
-            test_sys.iobridge.mem_side_port = test_sys.membus.cpu_side_ports
-
-        # Sanity check
-        if args.simpoint_profile:
-            if not ObjectList.is_noncaching_cpu(TestCPUClass):
-                fatal("SimPoint generation should be done with atomic cpu")
-            if np > 1:
-                fatal(
-                    "SimPoint generation not supported with more than one CPUs"
-                )
-
-        for i in range(np):
-            if args.simpoint_profile:
-                test_sys.cpu[i].addSimPointProbe(args.simpoint_interval)
-            if args.checker:
-                test_sys.cpu[i].addCheckerCpu()
-            if not ObjectList.is_kvm_cpu(TestCPUClass):
-                if args.bp_type:
-                    bpClass = ObjectList.bp_list.get(args.bp_type)
-                    test_sys.cpu[i].branchPred = bpClass()
-                if args.indirect_bp_type:
-                    IndirectBPClass = ObjectList.indirect_bp_list.get(
-                        args.indirect_bp_type
-                    )
-                    test_sys.cpu[
-                        i
-                    ].branchPred.indirectBranchPred = IndirectBPClass()
-            test_sys.cpu[i].createThreads()
-
-        # If elastic tracing is enabled when not restoring from checkpoint and
-        # when not fast forwarding using the atomic cpu, then check that the
-        # TestCPUClass is DerivO3CPU or inherits from DerivO3CPU. If the check
-        # passes then attach the elastic trace probe.
-        # If restoring from checkpoint or fast forwarding, the code that does this for
-        # FutureCPUClass is in the Simulation module. If the check passes then the
-        # elastic trace probe is attached to the switch CPUs.
-        if (
-            args.elastic_trace_en
-            and args.checkpoint_restore == None
-            and not args.fast_forward
-        ):
-            CpuConfig.config_etrace(TestCPUClass, test_sys.cpu, args)
-
-        CacheConfig.config_cache(args, test_sys)
-
-        MemConfig.config_mem(args, test_sys)
-
-    if ObjectList.is_kvm_cpu(TestCPUClass) or ObjectList.is_kvm_cpu(
-        FutureClass
-    ):
-        # Assign KVM CPUs to their own event queues / threads. This
-        # has to be done after creating caches and other child objects
-        # since these mustn't inherit the CPU event queue.
-        for i, cpu in enumerate(test_sys.cpu):
-            # Child objects usually inherit the parent's event
-            # queue. Override that and use the same event queue for
-            # all devices.
-            for obj in cpu.descendants():
-                obj.eventq_index = 0
-            cpu.eventq_index = i + 1
-        test_sys.kvm_vm = KvmVM()
-
-    return test_sys
-
-
-def build_drive_system(np):
-    # driver system CPU is always simple, so is the memory
-    # Note this is an assignment of a class, not an instance.
-    DriveCPUClass = AtomicSimpleCPU
-    drive_mem_mode = "atomic"
-    DriveMemClass = SimpleMemory
-
-    cmdline = cmd_line_template()
-    if buildEnv["USE_MIPS_ISA"]:
-        drive_sys = makeLinuxMipsSystem(drive_mem_mode, bm[1], cmdline=cmdline)
-    elif buildEnv["USE_SPARC_ISA"]:
-        drive_sys = makeSparcSystem(drive_mem_mode, bm[1], cmdline=cmdline)
-    elif buildEnv["USE_X86_ISA"]:
-        drive_sys = makeLinuxX86System(
-            drive_mem_mode, np, bm[1], cmdline=cmdline
-        )
-    elif buildEnv["USE_ARM_ISA"]:
-        drive_sys = makeArmSystem(
-            drive_mem_mode,
-            args.machine_type,
-            np,
-            bm[1],
-            args.dtb_filename,
-            cmdline=cmdline,
-        )
-
-    # Create a top-level voltage domain
-    drive_sys.voltage_domain = VoltageDomain(voltage=args.sys_voltage)
-
-    # Create a source clock for the system and set the clock period
-    drive_sys.clk_domain = SrcClockDomain(
-        clock=args.sys_clock, voltage_domain=drive_sys.voltage_domain
-    )
-
-    # Create a CPU voltage domain
-    drive_sys.cpu_voltage_domain = VoltageDomain()
-
-    # Create a source clock for the CPUs and set the clock period
-    drive_sys.cpu_clk_domain = SrcClockDomain(
-        clock=args.cpu_clock, voltage_domain=drive_sys.cpu_voltage_domain
-    )
-
-    drive_sys.cpu = DriveCPUClass(
-        clk_domain=drive_sys.cpu_clk_domain, cpu_id=0
-    )
-    drive_sys.cpu.createThreads()
-    drive_sys.cpu.createInterruptController()
-    drive_sys.cpu.connectBus(drive_sys.membus)
-    if args.kernel is not None:
-        drive_sys.workload.object_file = binary(args.kernel)
-
-    if ObjectList.is_kvm_cpu(DriveCPUClass):
-        drive_sys.kvm_vm = KvmVM()
-
-    drive_sys.iobridge = Bridge(delay="50ns", ranges=drive_sys.mem_ranges)
-    drive_sys.iobridge.cpu_side_port = drive_sys.iobus.mem_side_ports
-    drive_sys.iobridge.mem_side_port = drive_sys.membus.cpu_side_ports
-
-    # Create the appropriate memory controllers and connect them to the
-    # memory bus
-    drive_sys.mem_ctrls = [
-        DriveMemClass(range=r) for r in drive_sys.mem_ranges
-    ]
-    for i in range(len(drive_sys.mem_ctrls)):
-        drive_sys.mem_ctrls[i].port = drive_sys.membus.mem_side_ports
-
-    drive_sys.init_param = args.init_param
-
-    return drive_sys
-
-
-# Add args
-parser = argparse.ArgumentParser()
-Options.addCommonOptions(parser)
-Options.addFSOptions(parser)
-
-# Add the ruby specific and protocol specific args
-if "--ruby" in sys.argv:
-    Ruby.define_options(parser)
-
-args = parser.parse_args()
-
-# system under test can be any CPU
-(TestCPUClass, test_mem_mode, FutureClass) = Simulation.setCPUClass(args)
-
-# Match the memories with the CPUs, based on the options for the test system
-TestMemClass = Simulation.setMemClass(args)
-
-if args.benchmark:
-    try:
-        bm = Benchmarks[args.benchmark]
-    except KeyError:
-        print("Error benchmark %s has not been defined." % args.benchmark)
-        print("Valid benchmarks are: %s" % DefinedBenchmarks)
-        sys.exit(1)
-else:
-    if args.dual:
-        bm = [
-            SysConfig(
-                disks=args.disk_image,
-                rootdev=args.root_device,
-                mem=args.mem_size,
-                os_type=args.os_type,
-            ),
-            SysConfig(
-                disks=args.disk_image,
-                rootdev=args.root_device,
-                mem=args.mem_size,
-                os_type=args.os_type,
-            ),
-        ]
-    else:
-        bm = [
-            SysConfig(
-                disks=args.disk_image,
-                rootdev=args.root_device,
-                mem=args.mem_size,
-                os_type=args.os_type,
-            )
-        ]
-
-np = args.num_cpus
-
-test_sys = build_test_system(np)
-
-if len(bm) == 2:
-    drive_sys = build_drive_system(np)
-    root = makeDualRoot(True, test_sys, drive_sys, args.etherdump)
-elif len(bm) == 1 and args.dist:
-    # This system is part of a dist-gem5 simulation
-    root = makeDistRoot(
-        test_sys,
-        args.dist_rank,
-        args.dist_size,
-        args.dist_server_name,
-        args.dist_server_port,
-        args.dist_sync_repeat,
-        args.dist_sync_start,
-        args.ethernet_linkspeed,
-        args.ethernet_linkdelay,
-        args.etherdump,
-    )
-elif len(bm) == 1:
-    root = Root(full_system=True, system=test_sys)
-else:
-    print("Error I don't know how to create more than 2 systems.")
-    sys.exit(1)
-
-if ObjectList.is_kvm_cpu(TestCPUClass) or ObjectList.is_kvm_cpu(FutureClass):
-    # Required for running kvm on multiple host cores.
-    # Uses gem5's parallel event queue feature
-    # Note: The simulator is quite picky about this number!
-    root.sim_quantum = int(1e9)  # 1 ms
-
-if args.timesync:
-    root.time_sync_enable = True
-
-if args.frame_capture:
-    VncServer.frame_capture = True
-
-if buildEnv["USE_ARM_ISA"] and not args.bare_metal and not args.dtb_filename:
-    if args.machine_type not in [
-        "VExpress_GEM5",
-        "VExpress_GEM5_V1",
-        "VExpress_GEM5_V2",
-        "VExpress_GEM5_Foundation",
-    ]:
-        warn(
-            "Can only correctly generate a dtb for VExpress_GEM5_* "
-            "platforms, unless custom hardware models have been equipped "
-            "with generation functionality."
-        )
-
-    # Generate a Device Tree
-    for sysname in ("system", "testsys", "drivesys"):
-        if hasattr(root, sysname):
-            sys = getattr(root, sysname)
-            sys.workload.dtb_filename = os.path.join(
-                m5.options.outdir, "%s.dtb" % sysname
-            )
-            sys.generateDtb(sys.workload.dtb_filename)
-
-if args.wait_gdb:
-    test_sys.workload.wait_for_remote_gdb = True
-
-Simulation.setWorkCountOptions(test_sys, args)
-Simulation.run(args, root, test_sys, FutureClass)
+fatal(
+    "The 'configs/example/fs.py' script has been deprecated. It can be "
+    "found in 'configs/deprecated/example' if required. Its usage should be "
+    "avoided as it will be removed in future releases of gem5."
+)
--- a/configs/example/gem5_library/checkpoints/riscv-hello-restore-checkpoint.py
+++ b/configs/example/gem5_library/checkpoints/riscv-hello-restore-checkpoint.py
@@ -90,7 +90,7 @@ board = SimpleBoard(
 board.set_se_binary_workload(
    # the workload should be the same as the save-checkpoint script
    Resource("riscv-hello"),
-    checkpoint=Resource("riscv-hello-example-checkpoint-v22-1"),
+    checkpoint=Resource("riscv-hello-example-checkpoint-v23"),
 )

 simulator = Simulator(
--- a/configs/example/gem5_library/checkpoints/simpoints-se-checkpoint.py
+++ b/configs/example/gem5_library/checkpoints/simpoints-se-checkpoint.py
@@ -58,6 +58,7 @@ from gem5.components.processors.simple_processor import SimpleProcessor
 from gem5.components.processors.cpu_types import CPUTypes
 from gem5.isas import ISA
 from gem5.resources.workload import Workload
+from gem5.resources.resource import obtain_resource, SimpointResource
 from pathlib import Path
 from gem5.components.cachehierarchies.classic.no_cache import NoCache
 from gem5.simulate.exit_event_generators import (
@@ -108,7 +109,23 @@ board = SimpleBoard(
    cache_hierarchy=cache_hierarchy,
 )

-board.set_workload(Workload("x86-print-this-15000-with-simpoints"))
+# board.set_workload(
+#    Workload("x86-print-this-15000-with-simpoints")
+#
+# **Note: This has been removed until we update the resources.json file to
+# encapsulate the new Simpoint format.
+# Below we set the simpount manually.
+
+board.set_se_simpoint_workload(
+    binary=obtain_resource("x86-print-this"),
+    arguments=["print this", 15000],
+    simpoint=SimpointResource(
+        simpoint_interval=1000000,
+        simpoint_list=[2, 3, 4, 15],
+        weight_list=[0.1, 0.2, 0.4, 0.3],
+        warmup_interval=1000000,
+    ),
+)

 dir = Path(args.checkpoint_path)
 dir.mkdir(exist_ok=True)
--- a/configs/example/gem5_library/checkpoints/simpoints-se-restore.py
+++ b/configs/example/gem5_library/checkpoints/simpoints-se-restore.py
@@ -63,8 +63,9 @@ from gem5.components.memory import DualChannelDDR4_2400
 from gem5.components.processors.simple_processor import SimpleProcessor
 from gem5.components.processors.cpu_types import CPUTypes
 from gem5.isas import ISA
-from gem5.resources.resource import Resource
+from gem5.resources.resource import SimpointResource, obtain_resource
 from gem5.resources.workload import Workload
+from gem5.resources.resource import SimpointResource

 from pathlib import Path
 from m5.stats import reset, dump
@@ -96,11 +97,29 @@ board = SimpleBoard(
    cache_hierarchy=cache_hierarchy,
 )

-# Here we obtain the workloadfrom gem5 resources, the checkpoint in this
+# Here we obtain the workload from gem5 resources, the checkpoint in this
 # workload was generated from
 # `configs/example/gem5_library/checkpoints/simpoints-se-checkpoint.py`.
-board.set_workload(
-    Workload("x86-print-this-15000-with-simpoints-and-checkpoint")
+# board.set_workload(
+#    Workload("x86-print-this-15000-with-simpoints-and-checkpoint")
+#
+# **Note: This has been removed until we update the resources.json file to
+# encapsulate the new Simpoint format.
+# Below we set the simpount manually.
+#
+# This loads a single checkpoint as an example of using simpoints to simulate
+# the function of a single simpoint region.
+
+board.set_se_simpoint_workload(
+    binary=obtain_resource("x86-print-this"),
+    arguments=["print this", 15000],
+    simpoint=SimpointResource(
+        simpoint_interval=1000000,
+        simpoint_list=[2, 3, 4, 15],
+        weight_list=[0.1, 0.2, 0.4, 0.3],
+        warmup_interval=1000000,
+    ),
+    checkpoint=obtain_resource("simpoints-se-checkpoints-v23-0-v1"),
 )


--- a/configs/example/gem5_library/dramsys/arm-hello-dramsys.py
+++ b/configs/example/gem5_library/dramsys/arm-hello-dramsys.py
@@ -0,0 +1,92 @@
+# Copyright (c) 2021 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This gem5 configuation script creates a simple board to run an ARM
+"hello world" binary using the DRAMSys simulator.
+
+**Important Note**: DRAMSys must be compiled into the gem5 binary to use the
+DRRAMSys simulator. Please consult 'ext/dramsys/README' on how to compile
+correctly. If this is not done correctly this script will run with error.
+"""
+
+from gem5.isas import ISA
+from gem5.utils.requires import requires
+from gem5.resources.resource import Resource
+from gem5.components.memory import DRAMSysDDR3_1600
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.boards.simple_board import SimpleBoard
+from gem5.components.cachehierarchies.classic.private_l1_cache_hierarchy import (
+    PrivateL1CacheHierarchy,
+)
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.simulate.simulator import Simulator
+
+# This check ensures the gem5 binary is compiled to the ARM ISA target. If not,
+# an exception will be thrown.
+requires(isa_required=ISA.ARM)
+
+# We need a cache as DRAMSys only accepts requests with the size of a cache line
+cache_hierarchy = PrivateL1CacheHierarchy(l1d_size="32kB", l1i_size="32kB")
+
+# We use a single channel DDR3_1600 memory system
+memory = DRAMSysDDR3_1600(recordable=True)
+
+# We use a simple Timing processor with one core.
+processor = SimpleProcessor(cpu_type=CPUTypes.TIMING, isa=ISA.ARM, num_cores=1)
+
+# The gem5 library simble board which can be used to run simple SE-mode
+# simulations.
+board = SimpleBoard(
+    clk_freq="3GHz",
+    processor=processor,
+    memory=memory,
+    cache_hierarchy=cache_hierarchy,
+)
+
+# Here we set the workload. In this case we want to run a simple "Hello World!"
+# program compiled to the ARM ISA. The `Resource` class will automatically
+# download the binary from the gem5 Resources cloud bucket if it's not already
+# present.
+board.set_se_binary_workload(
+    # The `Resource` class reads the `resources.json` file from the gem5
+    # resources repository:
+    # https://gem5.googlesource.com/public/gem5-resource.
+    # Any resource specified in this file will be automatically retrieved.
+    # At the time of writing, this file is a WIP and does not contain all
+    # resources. Jira ticket: https://gem5.atlassian.net/browse/GEM5-1096
+    Resource("arm-hello64-static")
+)
+
+# Lastly we run the simulation.
+simulator = Simulator(board=board)
+simulator.run()
+
+print(
+    "Exiting @ tick {} because {}.".format(
+        simulator.get_current_tick(), simulator.get_last_exit_event_cause()
+    )
+)
--- a/configs/example/gem5_library/dramsys/dramsys-traffic.py
+++ b/configs/example/gem5_library/dramsys/dramsys-traffic.py
@@ -0,0 +1,62 @@
+# Copyright (c) 2023 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+"""
+This script is used for running a traffic generator connected to the
+DRAMSys simulator.
+
+**Important Note**: DRAMSys must be compiled into the gem5 binary to use the
+DRRAMSys simulator. Please consult 'ext/dramsys/README' on how to compile
+correctly. If this is not done correctly this script will run with error.
+"""
+import m5
+from gem5.components.memory import DRAMSysMem
+from gem5.components.boards.test_board import TestBoard
+from gem5.components.processors.linear_generator import LinearGenerator
+from m5.objects import Root
+
+memory = DRAMSysMem(
+    configuration="ext/dramsys/DRAMSys/DRAMSys/"
+    "library/resources/simulations/ddr4-example.json",
+    resource_directory="ext/dramsys/DRAMSys/DRAMSys/library/resources",
+    recordable=True,
+    size="4GB",
+)
+
+generator = LinearGenerator(
+    duration="250us",
+    rate="40GB/s",
+    num_cores=1,
+    max_addr=memory.get_size(),
+)
+board = TestBoard(
+    clk_freq="3GHz", generator=generator, memory=memory, cache_hierarchy=None
+)
+
+root = Root(full_system=False, system=board)
+board._pre_instantiate()
+m5.instantiate()
+generator.start_traffic()
+exit_event = m5.simulate()
--- a/configs/example/gem5_library/looppoints/create-looppoint-checkpoints.py
+++ b/configs/example/gem5_library/looppoints/create-looppoint-checkpoints.py
@@ -0,0 +1,138 @@
+# Copyright (c) 2023 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This configuration script shows an example of how to take checkpoints for
+LoopPoint using the gem5 stdlib. To take checkpoints for LoopPoint simulation
+regions, there must be a LoopPoint data file generated by Pin or the gem5
+simulator. With the information in the LoopPoint data file, the stdlib
+modules will take checkpoints at the beginning of the simulation regions
+(warmup region included if it exists) and record all restore needed information
+into a JSON file. The JSON file is needed for later restoring, so please call
+`looppoint.output_json_file()` at the end of the simulation.
+
+This script builds a simple board with the gem5 stdlib with no cache and a
+simple memory structure to take checkpoints. Some of the components, such as
+cache hierarchy, can be changed when restoring checkpoints.
+
+Usage
+-----
+```
+scons build/X86/gem5.opt
+./build/X86/gem5.opt \
+    configs/example/gem5_library/looppoints/create-looppoint-checkpoint.py
+```
+"""
+
+from gem5.simulate.exit_event import ExitEvent
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+from gem5.components.cachehierarchies.classic.no_cache import NoCache
+from gem5.components.boards.simple_board import SimpleBoard
+from gem5.components.memory.single_channel import SingleChannelDDR3_1600
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.isas import ISA
+from gem5.resources.workload import Workload
+from pathlib import Path
+from gem5.simulate.exit_event_generators import (
+    looppoint_save_checkpoint_generator,
+)
+
+import argparse
+
+requires(isa_required=ISA.X86)
+
+parser = argparse.ArgumentParser(
+    description="An example looppoint workload file path"
+)
+
+# The lone arguments is a file path to a directory to store the checkpoints.
+
+parser.add_argument(
+    "--checkpoint-path",
+    type=str,
+    required=False,
+    default="looppoint_checkpoints_folder",
+    help="The directory to store the checkpoints.",
+)
+
+args = parser.parse_args()
+
+# When taking a checkpoint, the cache state is not saved, so the cache
+# hierarchy can be changed completely when restoring from a checkpoint.
+# By using NoCache() to take checkpoints, it can slightly improve the
+# performance when running in atomic mode, and it will not put any restrictions
+# on what people can do with the checkpoints.
+cache_hierarchy = NoCache()
+
+
+# Using simple memory to take checkpoints might slightly imporve the
+# performance in atomic mode. The memory structure can be changed when
+# restoring from a checkpoint, but the size of the memory must be equal or
+# greater to that taken when creating the checkpoint.
+memory = SingleChannelDDR3_1600(size="2GB")
+
+processor = SimpleProcessor(
+    cpu_type=CPUTypes.ATOMIC,
+    isa=ISA.X86,
+    # LoopPoint can work with multicore workloads
+    num_cores=9,
+)
+
+board = SimpleBoard(
+    clk_freq="3GHz",
+    processor=processor,
+    memory=memory,
+    cache_hierarchy=cache_hierarchy,
+)
+
+board.set_workload(Workload("x86-matrix-multiply-omp-100-8-looppoint-csv"))
+
+dir = Path(args.checkpoint_path)
+dir.mkdir(exist_ok=True)
+
+simulator = Simulator(
+    board=board,
+    on_exit_event={
+        ExitEvent.SIMPOINT_BEGIN: looppoint_save_checkpoint_generator(
+            checkpoint_dir=dir,
+            looppoint=board.get_looppoint(),
+            # True if the relative PC count pairs should be updated during the
+            # simulation. Default as True.
+            update_relatives=True,
+            # True if the simulation loop should exit after all the PC count
+            # pairs in the LoopPoint data file have been encountered. Default
+            # as True.
+            exit_when_empty=True,
+        )
+    },
+)
+
+simulator.run()
+
+# Output the JSON file
+board.get_looppoint().output_json_file()
--- a/configs/example/gem5_library/looppoints/restore-looppoint-checkpoint.py
+++ b/configs/example/gem5_library/looppoints/restore-looppoint-checkpoint.py
@@ -0,0 +1,139 @@
+# Copyright (c) 2023 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This configuration script shows an example of how to restore a checkpoint that
+was taken for a LoopPoint simulation region in the example-restore.py.
+All the LoopPoint information should be passed in through the JSON file
+generated by the gem5 simulator when all the checkpoints were taken.
+
+This script builds a more complex board than the board used for taking
+checkpoints.
+
+Usage
+-----
+```
+./build/X86/gem5.opt \
+    configs/example/gem5_library/looppoints/restore-looppoint-checkpoint.py
+```
+"""
+import argparse
+
+from gem5.simulate.exit_event import ExitEvent
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+from gem5.components.cachehierarchies.classic.private_l1_private_l2_cache_hierarchy import (
+    PrivateL1PrivateL2CacheHierarchy,
+)
+from gem5.components.boards.simple_board import SimpleBoard
+from gem5.components.memory import DualChannelDDR4_2400
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.isas import ISA
+from gem5.resources.resource import obtain_resource
+from gem5.resources.workload import Workload
+from m5.stats import reset, dump
+
+requires(isa_required=ISA.X86)
+
+parser = argparse.ArgumentParser(description="An restore checkpoint script.")
+
+parser.add_argument(
+    "--checkpoint-region",
+    type=str,
+    required=False,
+    choices=(
+        "1",
+        "2",
+        "3",
+        "5",
+        "6",
+        "7",
+        "8",
+        "9",
+        "10",
+        "11",
+        "12",
+        "13",
+        "14",
+    ),
+    default="1",
+    help="The checkpoint region to restore from.",
+)
+args = parser.parse_args()
+
+# The cache hierarchy can be different from the cache hierarchy used in taking
+# the checkpoints
+cache_hierarchy = PrivateL1PrivateL2CacheHierarchy(
+    l1d_size="32kB",
+    l1i_size="32kB",
+    l2_size="256kB",
+)
+
+# The memory structure can be different from the memory structure used in
+# taking the checkpoints, but the size of the memory must be equal or larger.
+memory = DualChannelDDR4_2400(size="2GB")
+
+processor = SimpleProcessor(
+    cpu_type=CPUTypes.TIMING,
+    isa=ISA.X86,
+    # The number of cores must be equal or greater than that used when taking
+    # the checkpoint.
+    num_cores=9,
+)
+
+board = SimpleBoard(
+    clk_freq="3GHz",
+    processor=processor,
+    memory=memory,
+    cache_hierarchy=cache_hierarchy,
+)
+
+board.set_workload(
+    Workload(
+        f"x86-matrix-multiply-omp-100-8-looppoint-region-{args.checkpoint_region}"
+    )
+)
+
+# This generator will dump the stats and exit the simulation loop when the
+# simulation region reaches its end. In the case there is a warmup interval,
+# the simulation stats are reset after the warmup is complete.
+def reset_and_dump():
+    if len(board.get_looppoint().get_targets()) > 1:
+        print("Warmup region ended. Resetting stats.")
+        reset()
+        yield False
+    print("Region ended. Dumping stats.")
+    dump()
+    yield True
+
+
+simulator = Simulator(
+    board=board,
+    on_exit_event={ExitEvent.SIMPOINT_BEGIN: reset_and_dump()},
+)
+
+simulator.run()
--- a/configs/example/gem5_library/power-hello.py
+++ b/configs/example/gem5_library/power-hello.py
@@ -0,0 +1,89 @@
+# Copyright (c) 2023 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This gem5 configuation script creates a simple board to run a POWER
+"hello world" binary.
+
+This is setup is the close to the simplest setup possible using the gem5
+library. It does not contain any kind of caching, IO, or any non-essential
+components.
+
+Usage
+-----
+
+```
+scons build/POWER/gem5.opt
+./build/POWER/gem5.opt configs/example/gem5_library/power-hello.py
+```
+"""
+
+from gem5.isas import ISA
+from gem5.utils.requires import requires
+from gem5.resources.resource import Resource
+from gem5.components.memory import SingleChannelDDR4_2400
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.boards.simple_board import SimpleBoard
+from gem5.components.cachehierarchies.classic.no_cache import NoCache
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.simulate.simulator import Simulator
+
+# This check ensures the gem5 binary is compiled to the POWER ISA target.
+# If not, an exception will be thrown.
+requires(isa_required=ISA.POWER)
+
+# In this setup we don't have a cache. `NoCache` can be used for such setups.
+cache_hierarchy = NoCache()
+
+# We use a single channel DDR4_2400 memory system
+memory = SingleChannelDDR4_2400(size="32MB")
+
+# We use a simple ATOMIC processor with one core.
+processor = SimpleProcessor(
+    cpu_type=CPUTypes.ATOMIC, isa=ISA.POWER, num_cores=1
+)
+
+# The gem5 library simple board which can be used to run simple SE-mode
+# simulations.
+board = SimpleBoard(
+    clk_freq="3GHz",
+    processor=processor,
+    memory=memory,
+    cache_hierarchy=cache_hierarchy,
+)
+
+board.set_se_binary_workload(Resource("power-hello"))
+
+# Lastly we run the simulation.
+simulator = Simulator(board=board)
+simulator.run()
+
+print(
+    "Exiting @ tick {} because {}.".format(
+        simulator.get_current_tick(),
+        simulator.get_last_exit_event_cause(),
+    )
+)
--- a/configs/example/gem5_library/riscvmatched-hello.py
+++ b/configs/example/gem5_library/riscvmatched-hello.py
@@ -39,9 +39,7 @@ scons build/RISCV/gem5.opt

 from gem5.resources.resource import Resource
 from gem5.simulate.simulator import Simulator
-from python.gem5.prebuilt.riscvmatched.riscvmatched_board import (
-    RISCVMatchedBoard,
-)
+from gem5.prebuilt.riscvmatched.riscvmatched_board import RISCVMatchedBoard
 from gem5.isas import ISA
 from gem5.utils.requires import requires

--- a/configs/example/gem5_library/x86-gapbs-benchmarks.py
+++ b/configs/example/gem5_library/x86-gapbs-benchmarks.py
@@ -195,9 +195,9 @@ if args.synthetic == "1":
        )
        exit(-1)

-    command = "./{} -g {}\n".format(args.benchmark, args.size)
+    command = f"./{args.benchmark} -g {args.size}\n"
 else:
-    command = "./{} -sf ../{}".format(args.benchmark, args.size)
+    command = f"./{args.benchmark} -sf ../{args.size}"

 board.set_kernel_disk_workload(
    # The x86 linux kernel will be automatically downloaded to the
@@ -262,7 +262,9 @@ print("Done with the simulation")
 print()
 print("Performance statistics:")

-print("Simulated time in ROI: %.2fs" % ((end_tick - start_tick) / 1e12))
+print(
+    f"Simulated time in ROI: {(end_tick - start_tick) / 1000000000000.0:.2f}s"
+)
 print(
    "Ran a total of", simulator.get_current_tick() / 1e12, "simulated seconds"
 )
--- a/configs/example/gem5_library/x86-npb-benchmarks.py
+++ b/configs/example/gem5_library/x86-npb-benchmarks.py
@@ -195,7 +195,7 @@ board = X86Board(
 # properly.

 command = (
-    "/home/gem5/NPB3.3-OMP/bin/{}.{}.x;".format(args.benchmark, args.size)
+    f"/home/gem5/NPB3.3-OMP/bin/{args.benchmark}.{args.size}.x;"
    + "sleep 5;"
    + "m5 exit;"
 )
--- a/configs/example/gem5_library/x86-parsec-benchmarks.py
+++ b/configs/example/gem5_library/x86-parsec-benchmarks.py
@@ -177,10 +177,7 @@ board = X86Board(
 command = (
    "cd /home/gem5/parsec-benchmark;".format(args.benchmark)
    + "source env.sh;"
-    + "parsecmgmt -a run -p {} -c gcc-hooks -i {} \
-        -n {};".format(
-        args.benchmark, args.size, "2"
-    )
+    + f"parsecmgmt -a run -p {args.benchmark} -c gcc-hooks -i {args.size}         -n 2;"
    + "sleep 5;"
    + "m5 exit;"
 )
--- a/configs/example/gem5_library/x86-spec-cpu2006-benchmarks.py
+++ b/configs/example/gem5_library/x86-spec-cpu2006-benchmarks.py
@@ -179,7 +179,7 @@ if not os.path.exists(args.image):
    print(
        "https://gem5art.readthedocs.io/en/latest/tutorials/spec-tutorial.html"
    )
-    fatal("The disk-image is not found at {}".format(args.image))
+    fatal(f"The disk-image is not found at {args.image}")

 # Setting up all the fixed system parameters here
 # Caches: MESI Two Level Cache Hierarchy
@@ -252,7 +252,7 @@ except FileExistsError:
 # The runscript.sh file places `m5 exit` before and after the following command
 # Therefore, we only pass this command without m5 exit.

-command = "{} {} {}".format(args.benchmark, args.size, output_dir)
+command = f"{args.benchmark} {args.size} {output_dir}"

 board.set_kernel_disk_workload(
    # The x86 linux kernel will be automatically downloaded to the
@@ -262,7 +262,7 @@ board.set_kernel_disk_workload(
    kernel=Resource("x86-linux-kernel-4.19.83"),
    # The location of the x86 SPEC CPU 2017 image
    disk_image=CustomDiskImageResource(
-        args.image, disk_root_partition=args.partition
+        args.image, root_partition=args.partition
    ),
    readfile_contents=command,
 )
@@ -272,6 +272,7 @@ def handle_exit():
    print("Done bootling Linux")
    print("Resetting stats at the start of ROI!")
    m5.stats.reset()
+    processor.switch()
    yield False  # E.g., continue the simulation.
    print("Dump stats at the end of the ROI!")
    m5.stats.dump()
@@ -304,7 +305,11 @@ print("All simulation events were successful.")

 print("Performance statistics:")

-print("Simulated time: " + ((str(simulator.get_roi_ticks()[0]))))
+roi_begin_ticks = simulator.get_tick_stopwatch()[0][1]
+roi_end_ticks = simulator.get_tick_stopwatch()[1][1]
+
+print("roi simulated ticks: " + str(roi_end_ticks - roi_begin_ticks))
+
 print(
    "Ran a total of", simulator.get_current_tick() / 1e12, "simulated seconds"
 )
--- a/configs/example/gem5_library/x86-spec-cpu2017-benchmarks.py
+++ b/configs/example/gem5_library/x86-spec-cpu2017-benchmarks.py
@@ -193,7 +193,7 @@ if not os.path.exists(args.image):
    print(
        "https://gem5art.readthedocs.io/en/latest/tutorials/spec-tutorial.html"
    )
-    fatal("The disk-image is not found at {}".format(args.image))
+    fatal(f"The disk-image is not found at {args.image}")

 # Setting up all the fixed system parameters here
 # Caches: MESI Two Level Cache Hierarchy
@@ -266,7 +266,7 @@ except FileExistsError:
 # The runscript.sh file places `m5 exit` before and after the following command
 # Therefore, we only pass this command without m5 exit.

-command = "{} {} {}".format(args.benchmark, args.size, output_dir)
+command = f"{args.benchmark} {args.size} {output_dir}"

 # For enabling CustomResource, we pass an additional parameter to mount the
 # correct partition.
@@ -278,7 +278,7 @@ board.set_kernel_disk_workload(
    kernel=Resource("x86-linux-kernel-4.19.83"),
    # The location of the x86 SPEC CPU 2017 image
    disk_image=CustomDiskImageResource(
-        args.image, disk_root_partition=args.partition
+        args.image, root_partition=args.partition
    ),
    readfile_contents=command,
 )
@@ -288,6 +288,7 @@ def handle_exit():
    print("Done bootling Linux")
    print("Resetting stats at the start of ROI!")
    m5.stats.reset()
+    processor.switch()
    yield False  # E.g., continue the simulation.
    print("Dump stats at the end of the ROI!")
    m5.stats.dump()
@@ -319,7 +320,11 @@ print("Done with the simulation")
 print()
 print("Performance statistics:")

-print("Simulated time in ROI: " + ((str(simulator.get_roi_ticks()[0]))))
+roi_begin_ticks = simulator.get_tick_stopwatch()[0][1]
+roi_end_ticks = simulator.get_tick_stopwatch()[1][1]
+
+print("roi simulated ticks: " + str(roi_end_ticks - roi_begin_ticks))
+
 print(
    "Ran a total of", simulator.get_current_tick() / 1e12, "simulated seconds"
 )
--- a/configs/example/gpufs/DisjointNetwork.py
+++ b/configs/example/gpufs/DisjointNetwork.py
@@ -48,7 +48,7 @@ class DisjointSimple(SimpleNetwork):
    def connectCPU(self, opts, controllers):

        # Setup parameters for makeTopology call for CPU network
-        topo_module = import_module("topologies.%s" % opts.cpu_topology)
+        topo_module = import_module(f"topologies.{opts.cpu_topology}")
        topo_class = getattr(topo_module, opts.cpu_topology)
        _topo = topo_class(controllers)
        _topo.makeTopology(opts, self, SimpleIntLink, SimpleExtLink, Switch)
@@ -58,7 +58,7 @@ class DisjointSimple(SimpleNetwork):
    def connectGPU(self, opts, controllers):

        # Setup parameters for makeTopology call for GPU network
-        topo_module = import_module("topologies.%s" % opts.gpu_topology)
+        topo_module = import_module(f"topologies.{opts.gpu_topology}")
        topo_class = getattr(topo_module, opts.gpu_topology)
        _topo = topo_class(controllers)
        _topo.makeTopology(opts, self, SimpleIntLink, SimpleExtLink, Switch)
@@ -84,7 +84,7 @@ class DisjointGarnet(GarnetNetwork):
    def connectCPU(self, opts, controllers):

        # Setup parameters for makeTopology call for CPU network
-        topo_module = import_module("topologies.%s" % opts.cpu_topology)
+        topo_module = import_module(f"topologies.{opts.cpu_topology}")
        topo_class = getattr(topo_module, opts.cpu_topology)
        _topo = topo_class(controllers)
        _topo.makeTopology(
@@ -96,7 +96,7 @@ class DisjointGarnet(GarnetNetwork):
    def connectGPU(self, opts, controllers):

        # Setup parameters for makeTopology call
-        topo_module = import_module("topologies.%s" % opts.gpu_topology)
+        topo_module = import_module(f"topologies.{opts.gpu_topology}")
        topo_class = getattr(topo_module, opts.gpu_topology)
        _topo = topo_class(controllers)
        _topo.makeTopology(
--- a/configs/example/gpufs/amd/AmdGPUOptions.py
+++ b/configs/example/gpufs/amd/AmdGPUOptions.py
@@ -49,7 +49,7 @@ def addAmdGPUOptions(parser):
        "--cu-per-sqc",
        type=int,
        default=4,
-        help="number of CUs sharing an SQC" " (icache, and thus icache TLB)",
+        help="number of CUs sharing an SQC (icache, and thus icache TLB)",
    )
    parser.add_argument(
        "--cu-per-scalar-cache",
@@ -102,19 +102,19 @@ def addAmdGPUOptions(parser):
        "--issue-period",
        type=int,
        default=4,
-        help="Number of cycles per vector instruction issue" " period",
+        help="Number of cycles per vector instruction issue period",
    )
    parser.add_argument(
        "--glbmem-wr-bus-width",
        type=int,
        default=32,
-        help="VGPR to Coalescer (Global Memory) data bus width" " in bytes",
+        help="VGPR to Coalescer (Global Memory) data bus width in bytes",
    )
    parser.add_argument(
        "--glbmem-rd-bus-width",
        type=int,
        default=32,
-        help="Coalescer to VGPR (Global Memory) data bus width" " in bytes",
+        help="Coalescer to VGPR (Global Memory) data bus width in bytes",
    )
    # Currently we only support 1 local memory pipe
    parser.add_argument(
@@ -204,20 +204,20 @@ def addAmdGPUOptions(parser):
    parser.add_argument(
        "--LocalMemBarrier",
        action="store_true",
-        help="Barrier does not wait for writethroughs to " " complete",
+        help="Barrier does not wait for writethroughs to complete",
    )
    parser.add_argument(
        "--countPages",
        action="store_true",
-        help="Count Page Accesses and output in " " per-CU output files",
+        help="Count Page Accesses and output in per-CU output files",
    )
    parser.add_argument(
-        "--TLB-prefetch", type=int, help="prefetch depth for" "TLBs"
+        "--TLB-prefetch", type=int, help="prefetch depth for TLBs"
    )
    parser.add_argument(
        "--pf-type",
        type=str,
-        help="type of prefetch: " "PF_CU, PF_WF, PF_PHASE, PF_STRIDE",
+        help="type of prefetch: PF_CU, PF_WF, PF_PHASE, PF_STRIDE",
    )
    parser.add_argument("--pf-stride", type=int, help="set prefetch stride")
    parser.add_argument(
--- a/configs/example/gpufs/hip_cookbook.py
+++ b/configs/example/gpufs/hip_cookbook.py
@@ -42,7 +42,7 @@ from ruby import Ruby
 cookbook_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
    echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
@@ -99,18 +99,16 @@ if __name__ == "__m5_main__":

    # Create temp script to run application
    if args.app is None:
-        print("No application given. Use %s -a <app>" % sys.argv[0])
+        print(f"No application given. Use {sys.argv[0]} -a <app>")
        sys.exit(1)
    elif args.kernel is None:
-        print("No kernel path given. Use %s --kernel <vmlinux>" % sys.argv[0])
+        print(f"No kernel path given. Use {sys.argv[0]} --kernel <vmlinux>")
        sys.exit(1)
    elif args.disk_image is None:
-        print("No disk path given. Use %s --disk-image <linux>" % sys.argv[0])
+        print(f"No disk path given. Use {sys.argv[0]} --disk-image <linux>")
        sys.exit(1)
    elif args.gpu_mmio_trace is None:
-        print(
-            "No MMIO trace path. Use %s --gpu-mmio-trace <path>" % sys.argv[0]
-        )
+        print(f"No MMIO trace path. Use {sys.argv[0]} --gpu-mmio-trace <path>")
        sys.exit(1)

    _, tempRunscript = tempfile.mkstemp()
--- a/configs/example/gpufs/hip_rodinia.py
+++ b/configs/example/gpufs/hip_rodinia.py
@@ -43,7 +43,7 @@ from ruby import Ruby
 rodinia_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
    echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
@@ -107,18 +107,16 @@ if __name__ == "__m5_main__":

    # Create temp script to run application
    if args.app is None:
-        print("No application given. Use %s -a <app>" % sys.argv[0])
+        print(f"No application given. Use {sys.argv[0]} -a <app>")
        sys.exit(1)
    elif args.kernel is None:
-        print("No kernel path given. Use %s --kernel <vmlinux>" % sys.argv[0])
+        print(f"No kernel path given. Use {sys.argv[0]} --kernel <vmlinux>")
        sys.exit(1)
    elif args.disk_image is None:
-        print("No disk path given. Use %s --disk-image <linux>" % sys.argv[0])
+        print(f"No disk path given. Use {sys.argv[0]} --disk-image <linux>")
        sys.exit(1)
    elif args.gpu_mmio_trace is None:
-        print(
-            "No MMIO trace path. Use %s --gpu-mmio-trace <path>" % sys.argv[0]
-        )
+        print(f"No MMIO trace path. Use {sys.argv[0]} --gpu-mmio-trace <path>")
        sys.exit(1)

    _, tempRunscript = tempfile.mkstemp()
--- a/configs/example/gpufs/hip_samples.py
+++ b/configs/example/gpufs/hip_samples.py
@@ -42,7 +42,7 @@ from ruby import Ruby
 samples_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
    echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
@@ -97,18 +97,16 @@ if __name__ == "__m5_main__":

    # Create temp script to run application
    if args.app is None:
-        print("No application given. Use %s -a <app>" % sys.argv[0])
+        print(f"No application given. Use {sys.argv[0]} -a <app>")
        sys.exit(1)
    elif args.kernel is None:
-        print("No kernel path given. Use %s --kernel <vmlinux>" % sys.argv[0])
+        print(f"No kernel path given. Use {sys.argv[0]} --kernel <vmlinux>")
        sys.exit(1)
    elif args.disk_image is None:
-        print("No disk path given. Use %s --disk-image <linux>" % sys.argv[0])
+        print(f"No disk path given. Use {sys.argv[0]} --disk-image <linux>")
        sys.exit(1)
    elif args.gpu_mmio_trace is None:
-        print(
-            "No MMIO trace path. Use %s --gpu-mmio-trace <path>" % sys.argv[0]
-        )
+        print(f"No MMIO trace path. Use {sys.argv[0]} --gpu-mmio-trace <path>")
        sys.exit(1)

    _, tempRunscript = tempfile.mkstemp()
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -30,6 +30,7 @@
 # System includes
 import argparse
 import math
+import hashlib

 # gem5 related
 import m5
@@ -110,13 +111,13 @@ def addRunFSOptions(parser):
        action="store",
        type=str,
        default="16GB",
-        help="Specify the dGPU physical memory" "  size",
+        help="Specify the dGPU physical memory size",
    )
    parser.add_argument(
        "--dgpu-num-dirs",
        type=int,
        default=1,
-        help="Set " "the number of dGPU directories (memory controllers",
+        help="Set the number of dGPU directories (memory controllers",
    )
    parser.add_argument(
        "--dgpu-mem-type",
@@ -125,6 +126,17 @@ def addRunFSOptions(parser):
        help="type of memory to use",
    )

+    # These are the models that are both supported in gem5 and supported
+    # by the versions of ROCm supported by gem5 in full system mode. For
+    # other gfx versions there is some support in syscall emulation mode.
+    parser.add_argument(
+        "--gpu-device",
+        default="Vega10",
+        choices=["Vega10", "MI100", "MI200"],
+        help="GPU model to run: Vega10 (gfx900), MI100 (gfx908), or "
+        "MI200 (gfx90a)",
+    )
+

 def runGpuFSSystem(args):
    """
@@ -145,6 +157,11 @@ def runGpuFSSystem(args):
        math.ceil(float(n_cu) / args.cu_per_scalar_cache)
    )

+    # Verify MMIO trace is valid
+    mmio_md5 = hashlib.md5(open(args.gpu_mmio_trace, "rb").read()).hexdigest()
+    if mmio_md5 != "c4ff3326ae8a036e329b8b595c83bd6d":
+        m5.util.panic("MMIO file does not match gem5 resources")
+
    system = makeGpuFSSystem(args)

    root = Root(
@@ -184,7 +201,7 @@ def runGpuFSSystem(args):
            break
        else:
            print(
-                "Unknown exit event: %s. Continuing..." % exit_event.getCause()
+                f"Unknown exit event: {exit_event.getCause()}. Continuing..."
            )

    print(
--- a/configs/example/gpufs/system/amdgpu.py
+++ b/configs/example/gpufs/system/amdgpu.py
@@ -170,3 +170,18 @@ def connectGPU(system, args):
    system.pc.south_bridge.gpu.checkpoint_before_mmios = (
        args.checkpoint_before_mmios
    )
+
+    system.pc.south_bridge.gpu.device_name = args.gpu_device
+
+    if args.gpu_device == "MI100":
+        system.pc.south_bridge.gpu.DeviceID = 0x738C
+        system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002
+        system.pc.south_bridge.gpu.SubsystemID = 0x0C34
+    elif args.gpu_device == "MI200":
+        system.pc.south_bridge.gpu.DeviceID = 0x740F
+        system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002
+        system.pc.south_bridge.gpu.SubsystemID = 0x0C34
+    elif args.gpu_device == "Vega10":
+        system.pc.south_bridge.gpu.DeviceID = 0x6863
+    else:
+        panic("Unknown GPU device: {}".format(args.gpu_device))
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -61,7 +61,9 @@ def makeGpuFSSystem(args):
        panic("Need at least 2GB of system memory to load amdgpu module")

    # Use the common FSConfig to setup a Linux X86 System
-    (TestCPUClass, test_mem_mode, FutureClass) = Simulation.setCPUClass(args)
+    (TestCPUClass, test_mem_mode) = Simulation.getCPUClass(args.cpu_type)
+    if test_mem_mode == "atomic":
+        test_mem_mode = "atomic_noncaching"
    disks = [args.disk_image]
    if args.second_disk is not None:
        disks.extend([args.second_disk])
@@ -91,10 +93,11 @@ def makeGpuFSSystem(args):

    # Create specified number of CPUs. GPUFS really only needs one.
    system.cpu = [
-        X86KvmCPU(clk_domain=system.cpu_clk_domain, cpu_id=i)
+        TestCPUClass(clk_domain=system.cpu_clk_domain, cpu_id=i)
        for i in range(args.num_cpus)
    ]
-    system.kvm_vm = KvmVM()
+    if ObjectList.is_kvm_cpu(TestCPUClass):
+        system.kvm_vm = KvmVM()

    # Create AMDGPU and attach to southbridge
    shader = createGPU(system, args)
@@ -112,7 +115,8 @@ def makeGpuFSSystem(args):
        numHWQueues=args.num_hw_queues,
        walker=hsapp_pt_walker,
    )
-    dispatcher = GPUDispatcher()
+    dispatcher_exit_events = True if args.exit_at_gpu_kernel > -1 else False
+    dispatcher = GPUDispatcher(kernel_exit_events=dispatcher_exit_events)
    cp_pt_walker = VegaPagetableWalker()
    gpu_cmd_proc = GPUCommandProcessor(
        hsapp=gpu_hsapp, dispatcher=dispatcher, walker=cp_pt_walker
@@ -126,15 +130,55 @@ def makeGpuFSSystem(args):
    device_ih = AMDGPUInterruptHandler()
    system.pc.south_bridge.gpu.device_ih = device_ih

-    # Setup the SDMA engines
-    sdma0_pt_walker = VegaPagetableWalker()
-    sdma1_pt_walker = VegaPagetableWalker()
+    # Setup the SDMA engines depending on device. The MMIO base addresses
+    # can be found in the driver code under:
+    # include/asic_reg/sdmaX/sdmaX_Y_Z_offset.h
+    num_sdmas = 2
+    sdma_bases = []
+    sdma_sizes = []
+    if args.gpu_device == "Vega10":
+        num_sdmas = 2
+        sdma_bases = [0x4980, 0x5180]
+        sdma_sizes = [0x800] * 2
+    elif args.gpu_device == "MI100":
+        num_sdmas = 8
+        sdma_bases = [
+            0x4980,
+            0x6180,
+            0x78000,
+            0x79000,
+            0x7A000,
+            0x7B000,
+            0x7C000,
+            0x7D000,
+        ]
+        sdma_sizes = [0x1000] * 8
+    elif args.gpu_device == "MI200":
+        num_sdmas = 5
+        sdma_bases = [
+            0x4980,
+            0x6180,
+            0x78000,
+            0x79000,
+            0x7A000,
+        ]
+        sdma_sizes = [0x1000] * 5
+    else:
+        m5.util.panic(f"Unknown GPU device {args.gpu_device}")

-    sdma0 = SDMAEngine(walker=sdma0_pt_walker)
-    sdma1 = SDMAEngine(walker=sdma1_pt_walker)
+    sdma_pt_walkers = []
+    sdma_engines = []
+    for sdma_idx in range(num_sdmas):
+        sdma_pt_walker = VegaPagetableWalker()
+        sdma_engine = SDMAEngine(
+            walker=sdma_pt_walker,
+            mmio_base=sdma_bases[sdma_idx],
+            mmio_size=sdma_sizes[sdma_idx],
+        )
+        sdma_pt_walkers.append(sdma_pt_walker)
+        sdma_engines.append(sdma_engine)

-    system.pc.south_bridge.gpu.sdma0 = sdma0
-    system.pc.south_bridge.gpu.sdma1 = sdma1
+    system.pc.south_bridge.gpu.sdmas = sdma_engines

    # Setup PM4 packet processor
    pm4_pkt_proc = PM4PacketProcessor()
@@ -152,22 +196,22 @@ def makeGpuFSSystem(args):
    system._dma_ports.append(gpu_hsapp)
    system._dma_ports.append(gpu_cmd_proc)
    system._dma_ports.append(system.pc.south_bridge.gpu)
-    system._dma_ports.append(sdma0)
-    system._dma_ports.append(sdma1)
+    for sdma in sdma_engines:
+        system._dma_ports.append(sdma)
    system._dma_ports.append(device_ih)
    system._dma_ports.append(pm4_pkt_proc)
    system._dma_ports.append(system_hub)
    system._dma_ports.append(gpu_mem_mgr)
    system._dma_ports.append(hsapp_pt_walker)
    system._dma_ports.append(cp_pt_walker)
-    system._dma_ports.append(sdma0_pt_walker)
-    system._dma_ports.append(sdma1_pt_walker)
+    for sdma_pt_walker in sdma_pt_walkers:
+        system._dma_ports.append(sdma_pt_walker)

    gpu_hsapp.pio = system.iobus.mem_side_ports
    gpu_cmd_proc.pio = system.iobus.mem_side_ports
    system.pc.south_bridge.gpu.pio = system.iobus.mem_side_ports
-    sdma0.pio = system.iobus.mem_side_ports
-    sdma1.pio = system.iobus.mem_side_ports
+    for sdma in sdma_engines:
+        sdma.pio = system.iobus.mem_side_ports
    device_ih.pio = system.iobus.mem_side_ports
    pm4_pkt_proc.pio = system.iobus.mem_side_ports
    system_hub.pio = system.iobus.mem_side_ports
--- a/configs/example/gpufs/vega10.py
+++ b/configs/example/gpufs/vega10.py
@@ -0,0 +1,153 @@
+# Copyright (c) 2022-2023 Advanced Micro Devices, Inc.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from this
+# software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+import m5
+import runfs
+import base64
+import tempfile
+import argparse
+import sys
+import os
+
+from amd import AmdGPUOptions
+from common import Options
+from common import GPUTLBOptions
+from ruby import Ruby
+
+
+demo_runscript_without_checkpoint = """\
+export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
+export HSA_ENABLE_INTERRUPT=0
+dmesg -n8
+dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
+if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
+    echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
+    /sbin/m5 exit
+fi
+modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0
+echo "Running {} {}"
+echo "{}" | base64 -d > myapp
+chmod +x myapp
+./myapp {}
+/sbin/m5 exit
+"""
+
+demo_runscript_with_checkpoint = """\
+export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
+export HSA_ENABLE_INTERRUPT=0
+dmesg -n8
+dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
+if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
+    echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
+    /sbin/m5 exit
+fi
+modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0
+echo "Running {} {}"
+echo "{}" | base64 -d > myapp
+chmod +x myapp
+/sbin/m5 checkpoint
+./myapp {}
+/sbin/m5 exit
+"""
+
+
+def addDemoOptions(parser):
+    parser.add_argument(
+        "-a", "--app", default=None, help="GPU application to run"
+    )
+    parser.add_argument(
+        "-o", "--opts", default="", help="GPU application arguments"
+    )
+
+
+def runVegaGPUFS(cpu_type):
+    parser = argparse.ArgumentParser()
+    runfs.addRunFSOptions(parser)
+    Options.addCommonOptions(parser)
+    AmdGPUOptions.addAmdGPUOptions(parser)
+    Ruby.define_options(parser)
+    GPUTLBOptions.tlb_options(parser)
+    addDemoOptions(parser)
+
+    # Parse now so we can override options
+    args = parser.parse_args()
+    demo_runscript = ""
+
+    # Create temp script to run application
+    if args.app is None:
+        print(f"No application given. Use {sys.argv[0]} -a <app>")
+        sys.exit(1)
+    elif args.kernel is None:
+        print(f"No kernel path given. Use {sys.argv[0]} --kernel <vmlinux>")
+        sys.exit(1)
+    elif args.disk_image is None:
+        print(f"No disk path given. Use {sys.argv[0]} --disk-image <linux>")
+        sys.exit(1)
+    elif args.gpu_mmio_trace is None:
+        print(f"No MMIO trace path. Use {sys.argv[0]} --gpu-mmio-trace <path>")
+        sys.exit(1)
+    elif not os.path.isfile(args.app):
+        print("Could not find applcation", args.app)
+        sys.exit(1)
+
+    # Choose runscript Based on whether any checkpointing args are set
+    if args.checkpoint_dir is not None:
+        demo_runscript = demo_runscript_with_checkpoint
+    else:
+        demo_runscript = demo_runscript_without_checkpoint
+
+    with open(os.path.abspath(args.app), "rb") as binfile:
+        encodedBin = base64.b64encode(binfile.read()).decode()
+
+    _, tempRunscript = tempfile.mkstemp()
+    with open(tempRunscript, "w") as b64file:
+        runscriptStr = demo_runscript.format(
+            args.app, args.opts, encodedBin, args.opts
+        )
+        b64file.write(runscriptStr)
+
+    if args.second_disk == None:
+        args.second_disk = args.disk_image
+
+    # Defaults for Vega10
+    args.ruby = True
+    args.cpu_type = cpu_type
+    args.num_cpus = 1
+    args.mem_size = "3GB"
+    args.dgpu = True
+    args.dgpu_mem_size = "16GB"
+    args.dgpu_start = "0GB"
+    args.checkpoint_restore = 0
+    args.disjoint = True
+    args.timing_gpu = True
+    args.script = tempRunscript
+    args.dgpu_xor_low_bit = 0
+
+    # Run gem5
+    runfs.runGpuFSSystem(args)
--- a/configs/example/gpufs/vega10_atomic.py
+++ b/configs/example/gpufs/vega10_atomic.py
@@ -0,0 +1,32 @@
+# Copyright (c) 2023 Advanced Micro Devices, Inc.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from this
+# software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+import vega10
+
+vega10.runVegaGPUFS("AtomicSimpleCPU")
--- a/configs/example/gpufs/vega10_kvm.py
+++ b/configs/example/gpufs/vega10_kvm.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2022 Advanced Micro Devices, Inc.
+# Copyright (c) 2022-2023 Advanced Micro Devices, Inc.
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
@@ -27,104 +27,6 @@
 # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 # POSSIBILITY OF SUCH DAMAGE.

-import m5
-import runfs
-import base64
-import tempfile
-import argparse
-import sys
-import os
+import vega10

-from amd import AmdGPUOptions
-from common import Options
-from common import GPUTLBOptions
-from ruby import Ruby
-
-
-demo_runscript = """\
-export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
-export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
-dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
-if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
-    echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
-    /sbin/m5 exit
-fi
-modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0
-echo "Running {} {}"
-echo "{}" | base64 -d > myapp
-chmod +x myapp
-./myapp {}
-/sbin/m5 exit
-"""
-
-
-def addDemoOptions(parser):
-    parser.add_argument(
-        "-a", "--app", default=None, help="GPU application to run"
-    )
-    parser.add_argument(
-        "-o", "--opts", default="", help="GPU application arguments"
-    )
-
-
-if __name__ == "__m5_main__":
-    parser = argparse.ArgumentParser()
-    runfs.addRunFSOptions(parser)
-    Options.addCommonOptions(parser)
-    AmdGPUOptions.addAmdGPUOptions(parser)
-    Ruby.define_options(parser)
-    GPUTLBOptions.tlb_options(parser)
-    addDemoOptions(parser)
-
-    # Parse now so we can override options
-    args = parser.parse_args()
-
-    # Create temp script to run application
-    if args.app is None:
-        print("No application given. Use %s -a <app>" % sys.argv[0])
-        sys.exit(1)
-    elif args.kernel is None:
-        print("No kernel path given. Use %s --kernel <vmlinux>" % sys.argv[0])
-        sys.exit(1)
-    elif args.disk_image is None:
-        print("No disk path given. Use %s --disk-image <linux>" % sys.argv[0])
-        sys.exit(1)
-    elif args.gpu_mmio_trace is None:
-        print(
-            "No MMIO trace path. Use %s --gpu-mmio-trace <path>" % sys.argv[0]
-        )
-        sys.exit(1)
-    elif not os.path.isfile(args.app):
-        print("Could not find applcation", args.app)
-        sys.exit(1)
-
-    with open(os.path.abspath(args.app), "rb") as binfile:
-        encodedBin = base64.b64encode(binfile.read()).decode()
-
-    _, tempRunscript = tempfile.mkstemp()
-    with open(tempRunscript, "w") as b64file:
-        runscriptStr = demo_runscript.format(
-            args.app, args.opts, encodedBin, args.opts
-        )
-        b64file.write(runscriptStr)
-
-    if args.second_disk == None:
-        args.second_disk = args.disk_image
-
-    # Defaults for Vega10
-    args.ruby = True
-    args.cpu_type = "X86KvmCPU"
-    args.num_cpus = 1
-    args.mem_size = "3GB"
-    args.dgpu = True
-    args.dgpu_mem_size = "16GB"
-    args.dgpu_start = "0GB"
-    args.checkpoint_restore = 0
-    args.disjoint = True
-    args.timing_gpu = True
-    args.script = tempRunscript
-    args.dgpu_xor_low_bit = 0
-
-    # Run gem5
-    runfs.runGpuFSSystem(args)
+vega10.runVegaGPUFS("X86KvmCPU")
--- a/configs/example/hsaTopology.py
+++ b/configs/example/hsaTopology.py
@@ -118,11 +118,11 @@ def createVegaTopology(options):

    # Populate CPU node properties
    node_prop = (
-        "cpu_cores_count %s\n" % options.num_cpus
+        f"cpu_cores_count {options.num_cpus}\n"
        + "simd_count 0\n"
        + "mem_banks_count 1\n"
        + "caches_count 0\n"
-        + "io_links_count %s\n" % io_links
+        + f"io_links_count {io_links}\n"
        + "cpu_core_id_base 0\n"
        + "simd_id_base 0\n"
        + "max_waves_per_simd 0\n"
@@ -200,8 +200,8 @@ def createVegaTopology(options):
        "cpu_cores_count 0\n"
        + "simd_count 256\n"
        + "mem_banks_count 1\n"
-        + "caches_count %s\n" % caches
-        + "io_links_count %s\n" % io_links
+        + f"caches_count {caches}\n"
+        + f"io_links_count {io_links}\n"
        + "cpu_core_id_base 0\n"
        + "simd_id_base 2147487744\n"
        + "max_waves_per_simd 10\n"
@@ -212,11 +212,11 @@ def createVegaTopology(options):
        + "simd_arrays_per_engine 1\n"
        + "cu_per_simd_array 16\n"
        + "simd_per_cu 4\n"
-        + "max_slots_scratch_cu %s\n" % cu_scratch
+        + f"max_slots_scratch_cu {cu_scratch}\n"
        + "vendor_id 4098\n"
        + "device_id 26720\n"
        + "location_id 1024\n"
-        + "drm_render_minor %s\n" % drm_num
+        + f"drm_render_minor {drm_num}\n"
        + "hive_id 0\n"
        + "num_sdma_engines 2\n"
        + "num_sdma_xgmi_engines 0\n"
@@ -313,11 +313,11 @@ def createFijiTopology(options):

    # Populate CPU node properties
    node_prop = (
-        "cpu_cores_count %s\n" % options.num_cpus
+        f"cpu_cores_count {options.num_cpus}\n"
        + "simd_count 0\n"
        + "mem_banks_count 1\n"
        + "caches_count 0\n"
-        + "io_links_count %s\n" % io_links
+        + f"io_links_count {io_links}\n"
        + "cpu_core_id_base 0\n"
        + "simd_id_base 0\n"
        + "max_waves_per_simd 0\n"
@@ -392,33 +392,30 @@ def createFijiTopology(options):
    # Populate GPU node properties
    node_prop = (
        "cpu_cores_count 0\n"
-        + "simd_count %s\n"
-        % (options.num_compute_units * options.simds_per_cu)
+        + f"simd_count {options.num_compute_units * options.simds_per_cu}\n"
        + "mem_banks_count 1\n"
-        + "caches_count %s\n" % caches
-        + "io_links_count %s\n" % io_links
+        + f"caches_count {caches}\n"
+        + f"io_links_count {io_links}\n"
        + "cpu_core_id_base 0\n"
        + "simd_id_base 2147487744\n"
-        + "max_waves_per_simd %s\n" % options.wfs_per_simd
-        + "lds_size_in_kb %s\n" % int(options.lds_size / 1024)
+        + f"max_waves_per_simd {options.wfs_per_simd}\n"
+        + f"lds_size_in_kb {int(options.lds_size / 1024)}\n"
        + "gds_size_in_kb 0\n"
-        + "wave_front_size %s\n" % options.wf_size
+        + f"wave_front_size {options.wf_size}\n"
        + "array_count 4\n"
-        + "simd_arrays_per_engine %s\n" % options.sa_per_complex
-        + "cu_per_simd_array %s\n" % options.cu_per_sa
-        + "simd_per_cu %s\n" % options.simds_per_cu
+        + f"simd_arrays_per_engine {options.sa_per_complex}\n"
+        + f"cu_per_simd_array {options.cu_per_sa}\n"
+        + f"simd_per_cu {options.simds_per_cu}\n"
        + "max_slots_scratch_cu 32\n"
        + "vendor_id 4098\n"
        + "device_id 29440\n"
        + "location_id 512\n"
-        + "drm_render_minor %s\n" % drm_num
-        + "max_engine_clk_fcompute %s\n"
-        % int(toFrequency(options.gpu_clock) / 1e6)
+        + f"drm_render_minor {drm_num}\n"
+        + f"max_engine_clk_fcompute {int(toFrequency(options.gpu_clock) / 1000000.0)}\n"
        + "local_mem_size 4294967296\n"
        + "fw_version 730\n"
        + "capability 4736\n"
-        + "max_engine_clk_ccompute %s\n"
-        % int(toFrequency(options.CPUClock) / 1e6)
+        + f"max_engine_clk_ccompute {int(toFrequency(options.CPUClock) / 1000000.0)}\n"
    )

    file_append((node_dir, "properties"), node_prop)
@@ -484,34 +481,31 @@ def createCarrizoTopology(options):
    # populate global node properties
    # NOTE: SIMD count triggers a valid GPU agent creation
    node_prop = (
-        "cpu_cores_count %s\n" % options.num_cpus
-        + "simd_count %s\n"
-        % (options.num_compute_units * options.simds_per_cu)
-        + "mem_banks_count %s\n" % mem_banks_cnt
+        f"cpu_cores_count {options.num_cpus}\n"
+        + f"simd_count {options.num_compute_units * options.simds_per_cu}\n"
+        + f"mem_banks_count {mem_banks_cnt}\n"
        + "caches_count 0\n"
        + "io_links_count 0\n"
        + "cpu_core_id_base 16\n"
        + "simd_id_base 2147483648\n"
-        + "max_waves_per_simd %s\n" % options.wfs_per_simd
-        + "lds_size_in_kb %s\n" % int(options.lds_size / 1024)
+        + f"max_waves_per_simd {options.wfs_per_simd}\n"
+        + f"lds_size_in_kb {int(options.lds_size / 1024)}\n"
        + "gds_size_in_kb 0\n"
-        + "wave_front_size %s\n" % options.wf_size
+        + f"wave_front_size {options.wf_size}\n"
        + "array_count 1\n"
-        + "simd_arrays_per_engine %s\n" % options.sa_per_complex
-        + "cu_per_simd_array %s\n" % options.cu_per_sa
-        + "simd_per_cu %s\n" % options.simds_per_cu
+        + f"simd_arrays_per_engine {options.sa_per_complex}\n"
+        + f"cu_per_simd_array {options.cu_per_sa}\n"
+        + f"simd_per_cu {options.simds_per_cu}\n"
        + "max_slots_scratch_cu 32\n"
        + "vendor_id 4098\n"
-        + "device_id %s\n" % device_id
+        + f"device_id {device_id}\n"
        + "location_id 8\n"
-        + "drm_render_minor %s\n" % drm_num
-        + "max_engine_clk_fcompute %s\n"
-        % int(toFrequency(options.gpu_clock) / 1e6)
+        + f"drm_render_minor {drm_num}\n"
+        + f"max_engine_clk_fcompute {int(toFrequency(options.gpu_clock) / 1000000.0)}\n"
        + "local_mem_size 0\n"
        + "fw_version 699\n"
        + "capability 4738\n"
-        + "max_engine_clk_ccompute %s\n"
-        % int(toFrequency(options.CPUClock) / 1e6)
+        + f"max_engine_clk_ccompute {int(toFrequency(options.CPUClock) / 1000000.0)}\n"
    )

    file_append((node_dir, "properties"), node_prop)
--- a/configs/example/lupv/run_lupv.py
+++ b/configs/example/lupv/run_lupv.py
@@ -113,6 +113,4 @@ print("Beginning simulation!")

 exit_event = m5.simulate(args.max_ticks)

-print(
-    "Exiting @ tick {} because {}.".format(m5.curTick(), exit_event.getCause())
-)
+print(f"Exiting @ tick {m5.curTick()} because {exit_event.getCause()}.")
--- a/configs/example/memcheck.py
+++ b/configs/example/memcheck.py
@@ -330,7 +330,7 @@ def make_cache_level(ncaches, prototypes, level, next_cache):
 make_cache_level(cachespec, cache_proto, len(cachespec), None)

 # Connect the lowest level crossbar to the memory
-last_subsys = getattr(system, "l%dsubsys0" % len(cachespec))
+last_subsys = getattr(system, f"l{len(cachespec)}subsys0")
 last_subsys.xbar.mem_side_ports = system.physmem.port
 last_subsys.xbar.point_of_coherency = True

--- a/configs/example/memtest.py
+++ b/configs/example/memtest.py
@@ -211,8 +211,7 @@ else:

    if numtesters(cachespec, testerspec) > block_size:
        print(
-            "Error: Limited to %s testers because of false sharing"
-            % (block_size)
+            f"Error: Limited to {block_size} testers because of false sharing"
        )
        sys.exit(1)

@@ -351,7 +350,7 @@ make_cache_level(cachespec, cache_proto, len(cachespec), None)

 # Connect the lowest level crossbar to the last-level cache and memory
 # controller
-last_subsys = getattr(system, "l%dsubsys0" % len(cachespec))
+last_subsys = getattr(system, f"l{len(cachespec)}subsys0")
 last_subsys.xbar.point_of_coherency = True
 if args.noncoherent_cache:
    system.llc = NoncoherentCache(
--- a/configs/example/read_config.py
+++ b/configs/example/read_config.py
@@ -68,8 +68,7 @@ sim_object_classes_by_name = {

 def no_parser(cls, flags, param):
    raise Exception(
-        "Can't parse string: %s for parameter"
-        " class: %s" % (str(param), cls.__name__)
+        f"Can't parse string: {str(param)} for parameter class: {cls.__name__}"
    )


@@ -114,7 +113,7 @@ def memory_bandwidth_parser(cls, flags, param):
    value = 1.0 / float(param)
    # Convert to byte/s
    value = ticks.fromSeconds(value)
-    return cls("%fB/s" % value)
+    return cls(f"{value:f}B/s")


 # These parameters have trickier parsing from .ini files than might be
@@ -201,8 +200,7 @@ class ConfigManager(object):

        if object_type not in sim_object_classes_by_name:
            raise Exception(
-                "No SimObject type %s is available to"
-                " build: %s" % (object_type, object_name)
+                f"No SimObject type {object_type} is available to build: {object_name}"
            )

        object_class = sim_object_classes_by_name[object_type]
@@ -479,7 +477,7 @@ class ConfigIniFile(ConfigFile):
            if object_name == "root":
                return child_name
            else:
-                return "%s.%s" % (object_name, child_name)
+                return f"{object_name}.{child_name}"

        return [(name, make_path(name)) for name in child_names]

--- a/configs/example/riscv/fs_linux.py
+++ b/configs/example/riscv/fs_linux.py
@@ -91,7 +91,7 @@ from common import Options


 def generateMemNode(state, mem_range):
-    node = FdtNode("memory@%x" % int(mem_range.start))
+    node = FdtNode(f"memory@{int(mem_range.start):x}")
    node.append(FdtPropertyStrings("device_type", ["memory"]))
    node.append(
        FdtPropertyWords(
@@ -187,6 +187,7 @@ system.platform = HiFive()
 # RTCCLK (Set to 100MHz for faster simulation)
 system.platform.rtc = RiscvRTC(frequency=Frequency("100MHz"))
 system.platform.clint.int_pin = system.platform.rtc.int_pin
+system.platform.pci_host.pio = system.iobus.mem_side_ports

 # VirtIOMMIO
 if args.disk_image:
@@ -236,8 +237,6 @@ system.cpu_clk_domain = SrcClockDomain(
    clock=args.cpu_clock, voltage_domain=system.cpu_voltage_domain
 )

-system.workload.object_file = args.kernel
-
 # NOTE: Not yet tested
 if args.script is not None:
    system.readfile = args.script
--- a/configs/example/se.py
+++ b/configs/example/se.py
@@ -1,16 +1,4 @@
-# Copyright (c) 2012-2013 ARM Limited
-# All rights reserved.
-#
-# The license below extends only to copyright in the software and shall
-# not be construed as granting a license to any other intellectual
-# property including but not limited to intellectual property relating
-# to a hardware implementation of the functionality of the software
-# licensed hereunder.  You may use the software subject to the license
-# terms below provided that you ensure that this notice is replicated
-# unmodified and in its entirety in all distributions of the software,
-# modified or unmodified, in source code or in binary form.
-#
-# Copyright (c) 2006-2008 The Regents of The University of Michigan
+# Copyright (c) 2023 The Regents of the University of California
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
@@ -36,253 +24,10 @@
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

-# Simple test script
-#
-# "m5 test.py"
+from m5.util import fatal

-import argparse
-import sys
-import os
-
-import m5
-from m5.defines import buildEnv
-from m5.objects import *
-from m5.params import NULL
-from m5.util import addToPath, fatal, warn
-from gem5.isas import ISA
-from gem5.runtime import get_runtime_isa
-
-addToPath("../")
-
-from ruby import Ruby
-
-from common import Options
-from common import Simulation
-from common import CacheConfig
-from common import CpuConfig
-from common import ObjectList
-from common import MemConfig
-from common.FileSystemConfig import config_filesystem
-from common.Caches import *
-from common.cpu2000 import *
-
-
-def get_processes(args):
-    """Interprets provided args and returns a list of processes"""
-
-    multiprocesses = []
-    inputs = []
-    outputs = []
-    errouts = []
-    pargs = []
-
-    workloads = args.cmd.split(";")
-    if args.input != "":
-        inputs = args.input.split(";")
-    if args.output != "":
-        outputs = args.output.split(";")
-    if args.errout != "":
-        errouts = args.errout.split(";")
-    if args.options != "":
-        pargs = args.options.split(";")
-
-    idx = 0
-    for wrkld in workloads:
-        process = Process(pid=100 + idx)
-        process.executable = wrkld
-        process.cwd = os.getcwd()
-        process.gid = os.getgid()
-
-        if args.env:
-            with open(args.env, "r") as f:
-                process.env = [line.rstrip() for line in f]
-
-        if len(pargs) > idx:
-            process.cmd = [wrkld] + pargs[idx].split()
-        else:
-            process.cmd = [wrkld]
-
-        if len(inputs) > idx:
-            process.input = inputs[idx]
-        if len(outputs) > idx:
-            process.output = outputs[idx]
-        if len(errouts) > idx:
-            process.errout = errouts[idx]
-
-        multiprocesses.append(process)
-        idx += 1
-
-    if args.smt:
-        assert args.cpu_type == "DerivO3CPU"
-        return multiprocesses, idx
-    else:
-        return multiprocesses, 1
-
-
-parser = argparse.ArgumentParser()
-Options.addCommonOptions(parser)
-Options.addSEOptions(parser)
-
-if "--ruby" in sys.argv:
-    Ruby.define_options(parser)
-
-args = parser.parse_args()
-
-multiprocesses = []
-numThreads = 1
-
-if args.bench:
-    apps = args.bench.split("-")
-    if len(apps) != args.num_cpus:
-        print("number of benchmarks not equal to set num_cpus!")
-        sys.exit(1)
-
-    for app in apps:
-        try:
-            if get_runtime_isa() == ISA.ARM:
-                exec(
-                    "workload = %s('arm_%s', 'linux', '%s')"
-                    % (app, args.arm_iset, args.spec_input)
-                )
-            else:
-                # TARGET_ISA has been removed, but this is missing a ], so it
-                # has incorrect syntax and wasn't being used anyway.
-                exec(
-                    "workload = %s(buildEnv['TARGET_ISA', 'linux', '%s')"
-                    % (app, args.spec_input)
-                )
-            multiprocesses.append(workload.makeProcess())
-        except:
-            print(
-                "Unable to find workload for %s: %s"
-                % (get_runtime_isa().name(), app),
-                file=sys.stderr,
-            )
-            sys.exit(1)
-elif args.cmd:
-    multiprocesses, numThreads = get_processes(args)
-else:
-    print("No workload specified. Exiting!\n", file=sys.stderr)
-    sys.exit(1)
-
-
-(CPUClass, test_mem_mode, FutureClass) = Simulation.setCPUClass(args)
-CPUClass.numThreads = numThreads
-
-# Check -- do not allow SMT with multiple CPUs
-if args.smt and args.num_cpus > 1:
-    fatal("You cannot use SMT with multiple CPUs!")
-
-np = args.num_cpus
-mp0_path = multiprocesses[0].executable
-system = System(
-    cpu=[CPUClass(cpu_id=i) for i in range(np)],
-    mem_mode=test_mem_mode,
-    mem_ranges=[AddrRange(args.mem_size)],
-    cache_line_size=args.cacheline_size,
+fatal(
+    "The 'configs/example/se.py' script has been deprecated. It can be "
+    "found in 'configs/deprecated/example' if required. Its usage should be "
+    "avoided as it will be removed in future releases of gem5."
 )
-
-if numThreads > 1:
-    system.multi_thread = True
-
-# Create a top-level voltage domain
-system.voltage_domain = VoltageDomain(voltage=args.sys_voltage)
-
-# Create a source clock for the system and set the clock period
-system.clk_domain = SrcClockDomain(
-    clock=args.sys_clock, voltage_domain=system.voltage_domain
-)
-
-# Create a CPU voltage domain
-system.cpu_voltage_domain = VoltageDomain()
-
-# Create a separate clock domain for the CPUs
-system.cpu_clk_domain = SrcClockDomain(
-    clock=args.cpu_clock, voltage_domain=system.cpu_voltage_domain
-)
-
-# If elastic tracing is enabled, then configure the cpu and attach the elastic
-# trace probe
-if args.elastic_trace_en:
-    CpuConfig.config_etrace(CPUClass, system.cpu, args)
-
-# All cpus belong to a common cpu_clk_domain, therefore running at a common
-# frequency.
-for cpu in system.cpu:
-    cpu.clk_domain = system.cpu_clk_domain
-
-if ObjectList.is_kvm_cpu(CPUClass) or ObjectList.is_kvm_cpu(FutureClass):
-    if buildEnv["USE_X86_ISA"]:
-        system.kvm_vm = KvmVM()
-        system.m5ops_base = 0xFFFF0000
-        for process in multiprocesses:
-            process.useArchPT = True
-            process.kvmInSE = True
-    else:
-        fatal("KvmCPU can only be used in SE mode with x86")
-
-# Sanity check
-if args.simpoint_profile:
-    if not ObjectList.is_noncaching_cpu(CPUClass):
-        fatal("SimPoint/BPProbe should be done with an atomic cpu")
-    if np > 1:
-        fatal("SimPoint generation not supported with more than one CPUs")
-
-for i in range(np):
-    if args.smt:
-        system.cpu[i].workload = multiprocesses
-    elif len(multiprocesses) == 1:
-        system.cpu[i].workload = multiprocesses[0]
-    else:
-        system.cpu[i].workload = multiprocesses[i]
-
-    if args.simpoint_profile:
-        system.cpu[i].addSimPointProbe(args.simpoint_interval)
-
-    if args.checker:
-        system.cpu[i].addCheckerCpu()
-
-    if args.bp_type:
-        bpClass = ObjectList.bp_list.get(args.bp_type)
-        system.cpu[i].branchPred = bpClass()
-
-    if args.indirect_bp_type:
-        indirectBPClass = ObjectList.indirect_bp_list.get(
-            args.indirect_bp_type
-        )
-        system.cpu[i].branchPred.indirectBranchPred = indirectBPClass()
-
-    system.cpu[i].createThreads()
-
-if args.ruby:
-    Ruby.create_system(args, False, system)
-    assert args.num_cpus == len(system.ruby._cpu_ports)
-
-    system.ruby.clk_domain = SrcClockDomain(
-        clock=args.ruby_clock, voltage_domain=system.voltage_domain
-    )
-    for i in range(np):
-        ruby_port = system.ruby._cpu_ports[i]
-
-        # Create the interrupt controller and connect its ports to Ruby
-        # Note that the interrupt controller is always present but only
-        # in x86 does it have message ports that need to be connected
-        system.cpu[i].createInterruptController()
-
-        # Connect the cpu's cache ports to Ruby
-        ruby_port.connectCpuPorts(system.cpu[i])
-else:
-    MemClass = Simulation.setMemClass(args)
-    system.membus = SystemXBar()
-    system.system_port = system.membus.cpu_side_ports
-    CacheConfig.config_cache(args, system)
-    MemConfig.config_mem(args, system)
-    config_filesystem(system, args)
-
-system.workload = SEWorkload.init_compatible(mp0_path)
-
-if args.wait_gdb:
-    system.workload.wait_for_remote_gdb = True
-
-root = Root(full_system=False, system=system)
-Simulation.run(args, root, system, FutureClass)
--- a/configs/example/sst/riscv_fs.py
+++ b/configs/example/sst/riscv_fs.py
@@ -35,7 +35,7 @@ import argparse


 def generateMemNode(state, mem_range):
-    node = FdtNode("memory@%x" % int(mem_range.start))
+    node = FdtNode(f"memory@{int(mem_range.start):x}")
    node.append(FdtPropertyStrings("device_type", ["memory"]))
    node.append(
        FdtPropertyWords(
--- a/configs/learning_gem5/part1/caches.py
+++ b/configs/learning_gem5/part1/caches.py
@@ -75,7 +75,7 @@ class L1ICache(L1Cache):
    size = "16kB"

    SimpleOpts.add_option(
-        "--l1i_size", help="L1 instruction cache size. Default: %s" % size
+        "--l1i_size", help=f"L1 instruction cache size. Default: {size}"
    )

    def __init__(self, opts=None):
@@ -96,7 +96,7 @@ class L1DCache(L1Cache):
    size = "64kB"

    SimpleOpts.add_option(
-        "--l1d_size", help="L1 data cache size. Default: %s" % size
+        "--l1d_size", help=f"L1 data cache size. Default: {size}"
    )

    def __init__(self, opts=None):
@@ -122,9 +122,7 @@ class L2Cache(Cache):
    mshrs = 20
    tgts_per_mshr = 12

-    SimpleOpts.add_option(
-        "--l2_size", help="L2 cache size. Default: %s" % size
-    )
+    SimpleOpts.add_option("--l2_size", help=f"L2 cache size. Default: {size}")

    def __init__(self, opts=None):
        super(L2Cache, self).__init__()
--- a/configs/learning_gem5/part3/ruby_test.py
+++ b/configs/learning_gem5/part3/ruby_test.py
@@ -78,6 +78,4 @@ m5.instantiate()

 print("Beginning simulation!")
 exit_event = m5.simulate()
-print(
-    "Exiting @ tick {} because {}".format(m5.curTick(), exit_event.getCause())
-)
+print(f"Exiting @ tick {m5.curTick()} because {exit_event.getCause()}")
--- a/configs/learning_gem5/part3/simple_ruby.py
+++ b/configs/learning_gem5/part3/simple_ruby.py
@@ -110,6 +110,4 @@ m5.instantiate()

 print("Beginning simulation!")
 exit_event = m5.simulate()
-print(
-    "Exiting @ tick {} because {}".format(m5.curTick(), exit_event.getCause())
-)
+print(f"Exiting @ tick {m5.curTick()} because {exit_event.getCause()}")
--- a/configs/ruby/CHI.py
+++ b/configs/ruby/CHI.py
@@ -280,6 +280,6 @@ def create_system(
    elif options.topology in ["Crossbar", "Pt2Pt"]:
        topology = create_topology(network_cntrls, options)
    else:
-        m5.fatal("%s not supported!" % options.topology)
+        m5.fatal(f"{options.topology} not supported!")

    return (cpu_sequencers, mem_cntrls, topology)
--- a/configs/ruby/CHI_config.py
+++ b/configs/ruby/CHI_config.py
@@ -428,7 +428,7 @@ class CPUSequencerWrapper:
        cpu.icache_port = self.inst_seq.in_ports
        for p in cpu._cached_ports:
            if str(p) != "icache_port":
-                exec("cpu.%s = self.data_seq.in_ports" % p)
+                exec(f"cpu.{p} = self.data_seq.in_ports")
        cpu.connectUncachedPorts(
            self.data_seq.in_ports, self.data_seq.interrupt_out_port
        )
--- a/configs/ruby/Ruby.py
+++ b/configs/ruby/Ruby.py
@@ -120,8 +120,8 @@ def define_options(parser):
    )

    protocol = buildEnv["PROTOCOL"]
-    exec("from . import %s" % protocol)
-    eval("%s.define_options(parser)" % protocol)
+    exec(f"from . import {protocol}")
+    eval(f"{protocol}.define_options(parser)")
    Network.define_options(parser)


@@ -207,8 +207,8 @@ def create_topology(controllers, options):
    found in configs/topologies/BaseTopology.py
    This is a wrapper for the legacy topologies.
    """
-    exec("import topologies.%s as Topo" % options.topology)
-    topology = eval("Topo.%s(controllers)" % options.topology)
+    exec(f"import topologies.{options.topology} as Topo")
+    topology = eval(f"Topo.{options.topology}(controllers)")
    return topology


@@ -242,7 +242,7 @@ def create_system(
        cpus = system.cpu

    protocol = buildEnv["PROTOCOL"]
-    exec("from . import %s" % protocol)
+    exec(f"from . import {protocol}")
    try:
        (cpu_sequencers, dir_cntrls, topology) = eval(
            "%s.create_system(options, full_system, system, dma_ports,\
@@ -250,7 +250,7 @@ def create_system(
            % protocol
        )
    except:
-        print("Error: could not create sytem for ruby protocol %s" % protocol)
+        print(f"Error: could not create sytem for ruby protocol {protocol}")
        raise

    # Create the network topology
--- a/configs/topologies/CustomMesh.py
+++ b/configs/topologies/CustomMesh.py
@@ -325,9 +325,7 @@ class CustomMesh(SimpleTopology):
                rni_io_params = check_same(type(n).NoC_Params, rni_io_params)
            else:
                fatal(
-                    "topologies.CustomMesh: {} not supported".format(
-                        n.__class__.__name__
-                    )
+                    f"topologies.CustomMesh: {n.__class__.__name__} not supported"
                )

        # Create all mesh routers
@@ -420,11 +418,11 @@ class CustomMesh(SimpleTopology):
            if pair_debug:
                print(c.path())
                for r in c.addr_ranges:
-                    print("%s" % r)
+                    print(f"{r}")
                for p in c._pairing:
                    print("\t" + p.path())
                    for r in p.addr_ranges:
-                        print("\t%s" % r)
+                        print(f"\t{r}")

        # all must be paired
        for c in all_cache:
@@ -516,8 +514,8 @@ class CustomMesh(SimpleTopology):
                assert len(c._pairing) == pairing_check
                print(c.path())
                for r in c.addr_ranges:
-                    print("%s" % r)
+                    print(f"{r}")
                for p in c._pairing:
                    print("\t" + p.path())
                    for r in p.addr_ranges:
-                        print("\t%s" % r)
+                        print(f"\t{r}")
--- a/ext/drampower/SConscript
+++ b/ext/drampower/SConscript
@@ -41,7 +41,7 @@ import os

 Import('env')

-env.Prepend(CPPPATH=Dir('./src'))
+env.Prepend(CPPPATH=Dir('./src').srcnode())

 # Add the appropriate files for the library
 drampower_files = []
--- a/ext/dramsim2/SConscript
+++ b/ext/dramsim2/SConscript
@@ -59,7 +59,7 @@ DRAMFile('AddressMapping.cpp')
 DRAMFile('Bank.cpp')
 DRAMFile('BankState.cpp')
 DRAMFile('BusPacket.cpp')
-DRAMFile('ClockDoenv.cpp')
+DRAMFile('ClockDomain.cpp')
 DRAMFile('CommandQueue.cpp')
 DRAMFile('IniReader.cpp')
 DRAMFile('MemoryController.cpp')
@@ -85,6 +85,6 @@ dramenv.Append(CCFLAGS=['-DNO_STORAGE'])

 dramenv.Library('dramsim2', [dramenv.SharedObject(f) for f in dram_files])

-env.Prepend(CPPPATH=Dir('.'))
+env.Prepend(CPPPATH=Dir('.').srcnode())
 env.Append(LIBS=['dramsim2'])
 env.Prepend(LIBPATH=[Dir('.')])
--- a/ext/dramsim3/SConscript
+++ b/ext/dramsim3/SConscript
@@ -56,12 +56,12 @@ dramsim_path = os.path.join(Dir('#').abspath, 'ext/dramsim3/DRAMsim3/')

 if thermal:
    superlu_path = os.path.join(dramsim_path, 'ext/SuperLU_MT_3.1/lib')
-    env.Prepend(CPPPATH=Dir('.'))
+    env.Prepend(CPPPATH=Dir('.').srcnode())
    env.Append(LIBS=['dramsim3', 'superlu_mt_OPENMP', 'm', 'f77blas',
                      'atlas', 'gomp'],
                LIBPATH=[dramsim_path, superlu_path])
 else:
-    env.Prepend(CPPPATH=Dir('.'))
+    env.Prepend(CPPPATH=Dir('.').srcnode())
    # a littel hacky but can get a shared library working
    env.Append(LIBS=['dramsim3', 'gomp'],
                LIBPATH=[dramsim_path],  # compile-time lookup
--- a/ext/dramsys/README
+++ b/ext/dramsys/README
@@ -0,0 +1,10 @@
+Follow these steps to get DRAMSys as part of gem5
+
+1. Go to ext/dramsys (this directory)
+2. Clone DRAMSys: 'git clone --recursive git@github.com:tukl-msd/DRAMSys.git DRAMSys'
+3. Change directory to DRAMSys: 'cd DRAMSys'
+4. Checkout the correct commit: 'git checkout -b gem5 09f6dcbb91351e6ee7cadfc7bc8b29d97625db8f'
+
+If you wish to run a simulation using the gem5 processor cores, make sure to enable the storage mode in DRAMSys.
+This is done by setting the value of the "StoreMode" key to "Store" in the base configuration file.
+Those configuration file can be found in 'DRAMSys/library/resources/configs/simulator'.
--- a/ext/dramsys/SConscript
+++ b/ext/dramsys/SConscript
@@ -0,0 +1,96 @@
+# Copyright (c) 2022 Fraunhofer IESE
+# All rights reserved
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import os
+
+Import('env')
+
+build_root = Dir('../..').abspath
+src_root = Dir('DRAMSys/DRAMSys/library').srcnode().abspath
+
+# See if we got a cloned DRAMSys repo as a subdirectory and set the
+# HAVE_DRAMSys flag accordingly
+if not os.path.exists(Dir('.').srcnode().abspath + '/DRAMSys'):
+    env['HAVE_DRAMSYS'] = False
+    Return()
+
+env['HAVE_DRAMSYS'] = True
+
+dramsys_files = []
+dramsys_configuration_files = []
+
+dramsys_files.extend(Glob("%s/*.cpp" % f"{src_root}/src/controller"))
+for root, dirs, files in os.walk(f"{src_root}/src/controller", topdown=False):
+    for dir in dirs:
+        dramsys_files.extend(Glob("%s/*.cpp" % os.path.join(root, dir)))
+
+dramsys_files.extend(Glob("%s/*.cpp" % f"{src_root}/src/simulation"))
+for root, dirs, files in os.walk(f"{src_root}/src/simulation", topdown=False):
+    for dir in dirs:
+        dramsys_files.extend(Glob("%s/*.cpp" % os.path.join(root, dir)))
+
+dramsys_files.extend(Glob("%s/*.cpp" % f"{src_root}/src/configuration"))
+for root, dirs, files in os.walk(f"{src_root}/src/configuration", topdown=False):
+    for dir in dirs:
+        dramsys_files.extend(Glob("%s/*.cpp" % os.path.join(root, dir)))
+
+dramsys_files.extend(Glob("%s/*.cpp" % f"{src_root}/src/error"))
+dramsys_files.extend(Glob(f"{src_root}/src/error/ECC/Bit.cpp"))
+dramsys_files.extend(Glob(f"{src_root}/src/error/ECC/ECC.cpp"))
+dramsys_files.extend(Glob(f"{src_root}/src/error/ECC/Word.cpp"))
+
+dramsys_files.extend(Glob("%s/*.cpp" % f"{src_root}/src/common"))
+dramsys_files.extend(Glob("%s/*.cpp" % f"{src_root}/src/common/configuration"))
+dramsys_files.extend(Glob("%s/*.cpp" % f"{src_root}/src/common/configuration/memspec"))
+dramsys_files.extend(Glob("%s/*.c" % f"{src_root}/src/common/third_party/sqlite-amalgamation"))
+
+env.Prepend(CPPPATH=[
+    src_root + "/src",
+    src_root + "/src/common/configuration",
+    src_root + "/src/common/third_party/nlohmann/include",
+])
+
+env.Prepend(CPPDEFINES=[("DRAMSysResourceDirectory", '\\"' + os.getcwd() + '/resources' + '\\"')])
+env.Prepend(CPPDEFINES=[("SYSTEMC_VERSION", 20191203)])
+
+dramsys = env.Clone()
+
+if '-Werror' in dramsys['CCFLAGS']:
+    dramsys['CCFLAGS'].remove('-Werror')
+
+dramsys.Prepend(CPPPATH=[
+    src_root + "/src/common/third_party/sqlite-amalgamation",
+    build_root + "/systemc/ext"
+])
+
+dramsys.Prepend(CPPDEFINES=[("SQLITE_ENABLE_RTREE", "1")])
+
+dramsys_configuration = env.Clone()
+
+dramsys.Library('dramsys', dramsys_files)
+
+env.Append(LIBS=['dramsys', 'dl'])
+env.Append(LIBPATH=[Dir('.')])
--- a/ext/fputils/SConscript
+++ b/ext/fputils/SConscript
@@ -30,7 +30,7 @@

 Import('env')

-env.Prepend(CPPPATH=Dir('./include'))
+env.Prepend(CPPPATH=Dir('./include').srcnode())

 fpenv = env.Clone()

--- a/ext/gdbremote/signals.hh
+++ b/ext/gdbremote/signals.hh
@@ -0,0 +1,181 @@
+//===-- Generated From GDBRemoteSignals.cpp ------------------------===//
+//
+// Part of the LLVM Project,
+// under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------===//
+
+#include <stdint.h>
+
+#ifndef __BASE_GDB_SIGNALS_HH__
+#define __BASE_GDB_SIGNALS_HH__
+
+/*
+These signals definitions are produced from LLVM's
+  lldb/source/Plugins/Process/Utility/GDBRemoteSignals.cpp
+*/
+namespace gem5{
+  enum class GDBSignal : uint8_t
+  {
+    ZERO = 0, //Signal 0
+    HUP = 1, //hangup
+    INT = 2, //interrupt
+    QUIT = 3, //quit
+    ILL = 4, //illegal instruction
+    TRAP = 5, //trace trap (not reset when caught)
+    ABRT = 6, //SIGIOT
+    EMT = 7, //emulation trap
+    FPE = 8, //floating point exception
+    KILL = 9, //kill
+    BUS = 10, //bus error
+    SEGV = 11, //segmentation violation
+    SYS = 12, //invalid system call
+    PIPE = 13, //write to pipe with reading end closed
+    ALRM = 14, //alarm
+    TERM = 15, //termination requested
+    URG = 16, //urgent data on socket
+    STOP = 17, //process stop
+    TSTP = 18, //tty stop
+    CONT = 19, //process continue
+    CHLD = 20, //SIGCLD
+    TTIN = 21, //background tty read
+    TTOU = 22, //background tty write
+    IO = 23, //input/output ready/Pollable event
+    XCPU = 24, //CPU resource exceeded
+    XFSZ = 25, //file size limit exceeded
+    VTALRM = 26, //virtual time alarm
+    PROF = 27, //profiling time alarm
+    WINCH = 28, //window size changes
+    LOST = 29, //resource lost
+    USR1 = 30, //user defined signal 1
+    USR2 = 31, //user defined signal 2
+    PWR = 32, //power failure
+    POLL = 33, //pollable event
+    WIND = 34, //SIGWIND
+    PHONE = 35, //SIGPHONE
+    WAITING = 36, //process's LWPs are blocked
+    LWP = 37, //signal LWP
+    DANGER = 38, //swap space dangerously low
+    GRANT = 39, //monitor mode granted
+    RETRACT = 40, //need to relinquish monitor mode
+    MSG = 41, //monitor mode data available
+    SOUND = 42, //sound completed
+    SAK = 43, //secure attention
+    PRIO = 44, //SIGPRIO
+
+    SIG33 = 45, //real-time event 33
+    SIG34 = 46, //real-time event 34
+    SIG35 = 47, //real-time event 35
+    SIG36 = 48, //real-time event 36
+    SIG37 = 49, //real-time event 37
+    SIG38 = 50, //real-time event 38
+    SIG39 = 51, //real-time event 39
+    SIG40 = 52, //real-time event 40
+    SIG41 = 53, //real-time event 41
+    SIG42 = 54, //real-time event 42
+    SIG43 = 55, //real-time event 43
+    SIG44 = 56, //real-time event 44
+    SIG45 = 57, //real-time event 45
+    SIG46 = 58, //real-time event 46
+    SIG47 = 59, //real-time event 47
+    SIG48 = 60, //real-time event 48
+    SIG49 = 61, //real-time event 49
+    SIG50 = 62, //real-time event 50
+    SIG51 = 63, //real-time event 51
+    SIG52 = 64, //real-time event 52
+    SIG53 = 65, //real-time event 53
+    SIG54 = 66, //real-time event 54
+    SIG55 = 67, //real-time event 55
+    SIG56 = 68, //real-time event 56
+    SIG57 = 69, //real-time event 57
+    SIG58 = 70, //real-time event 58
+    SIG59 = 71, //real-time event 59
+    SIG60 = 72, //real-time event 60
+    SIG61 = 73, //real-time event 61
+    SIG62 = 74, //real-time event 62
+    SIG63 = 75, //real-time event 63
+
+    CANCEL = 76, //LWP internal signal
+
+    SIG32 = 77, //real-time event 32
+    SIG64 = 78, //real-time event 64
+    SIG65 = 79, //real-time event 65
+    SIG66 = 80, //real-time event 66
+    SIG67 = 81, //real-time event 67
+    SIG68 = 82, //real-time event 68
+    SIG69 = 83, //real-time event 69
+    SIG70 = 84, //real-time event 70
+    SIG71 = 85, //real-time event 71
+    SIG72 = 86, //real-time event 72
+    SIG73 = 87, //real-time event 73
+    SIG74 = 88, //real-time event 74
+    SIG75 = 89, //real-time event 75
+    SIG76 = 90, //real-time event 76
+    SIG77 = 91, //real-time event 77
+    SIG78 = 92, //real-time event 78
+    SIG79 = 93, //real-time event 79
+    SIG80 = 94, //real-time event 80
+    SIG81 = 95, //real-time event 81
+    SIG82 = 96, //real-time event 82
+    SIG83 = 97, //real-time event 83
+    SIG84 = 98, //real-time event 84
+    SIG85 = 99, //real-time event 85
+    SIG86 = 100, //real-time event 86
+    SIG87 = 101, //real-time event 87
+    SIG88 = 102, //real-time event 88
+    SIG89 = 103, //real-time event 89
+    SIG90 = 104, //real-time event 90
+    SIG91 = 105, //real-time event 91
+    SIG92 = 106, //real-time event 92
+    SIG93 = 107, //real-time event 93
+    SIG94 = 108, //real-time event 94
+    SIG95 = 109, //real-time event 95
+    SIG96 = 110, //real-time event 96
+    SIG97 = 111, //real-time event 97
+    SIG98 = 112, //real-time event 98
+    SIG99 = 113, //real-time event 99
+    SIG100 = 114, //real-time event 100
+    SIG101 = 115, //real-time event 101
+    SIG102 = 116, //real-time event 102
+    SIG103 = 117, //real-time event 103
+    SIG104 = 118, //real-time event 104
+    SIG105 = 119, //real-time event 105
+    SIG106 = 120, //real-time event 106
+    SIG107 = 121, //real-time event 107
+    SIG108 = 122, //real-time event 108
+    SIG109 = 123, //real-time event 109
+    SIG110 = 124, //real-time event 110
+    SIG111 = 125, //real-time event 111
+    SIG112 = 126, //real-time event 112
+    SIG113 = 127, //real-time event 113
+    SIG114 = 128, //real-time event 114
+    SIG115 = 129, //real-time event 115
+    SIG116 = 130, //real-time event 116
+    SIG117 = 131, //real-time event 117
+    SIG118 = 132, //real-time event 118
+    SIG119 = 133, //real-time event 119
+    SIG120 = 134, //real-time event 120
+    SIG121 = 135, //real-time event 121
+    SIG122 = 136, //real-time event 122
+    SIG123 = 137, //real-time event 123
+    SIG124 = 138, //real-time event 124
+    SIG125 = 139, //real-time event 125
+    SIG126 = 140, //real-time event 126
+    SIG127 = 141, //real-time event 127
+
+    INFO = 142, //information request
+    unknown = 143, //unknown signal
+
+    EXC_BAD_ACCESS = 145, //could not access memory
+    EXC_BAD_INSTRUCTION = 146, //illegal instruction/operand
+    EXC_ARITHMETIC = 147, //arithmetic exception
+    EXC_EMULATION = 148, //emulation instruction
+    EXC_SOFTWARE = 149, //software generated exception
+    EXC_BREAKPOINT = 150, //breakpoint
+
+    LIBRT = 151, //librt internal signal
+  };
+}
+#endif /* __BASE_GDB_SIGNALS_HH__ */
--- a/ext/iostream3/SConscript
+++ b/ext/iostream3/SConscript
@@ -41,6 +41,6 @@ Import('env')

 env.Library('iostream3', [env.SharedObject('zfstream.cc')])

-env.Prepend(CPPPATH=Dir('.'))
+env.Prepend(CPPPATH=Dir('.').srcnode())
 env.Append(LIBS=['iostream3'])
 env.Prepend(LIBPATH=[Dir('.')])
--- a/ext/libelf/SConscript
+++ b/ext/libelf/SConscript
@@ -127,16 +127,19 @@ if not SCons.Tool.m4.exists(m4env):
 # Setup m4 tool
 m4env.Tool('m4')

-m4env.Append(M4FLAGS=['-DSRCDIR=%s' % Dir('.').path])
+m4env.Append(M4FLAGS=['-DSRCDIR=%s' % Dir('.').srcnode().path])
 m4env['M4COM'] = '$M4 $M4FLAGS $SOURCES > $TARGET'
 m4env.M4(target=File('libelf_convert.c'),
-         source=[File('elf_types.m4'), File('libelf_convert.m4')])
+         source=[File('elf_types.m4').srcnode(),
+                 File('libelf_convert.m4').srcnode()])
 m4env.M4(target=File('libelf_fsize.c'),
-         source=[File('elf_types.m4'), File('libelf_fsize.m4')])
+         source=[File('elf_types.m4').srcnode(),
+                 File('libelf_fsize.m4').srcnode()])
 m4env.M4(target=File('libelf_msize.c'),
-         source=[File('elf_types.m4'), File('libelf_msize.m4')])
+         source=[File('elf_types.m4').srcnode(),
+                 File('libelf_msize.m4').srcnode()])

-m4env.Append(CPPPATH=Dir('.'))
+m4env.Append(CPPPATH=[Dir('.'), Dir('.').srcnode()])

 # Build libelf as a static library with PIC code so it can be linked
 # into either m5 or the library
@@ -146,6 +149,6 @@ m4env.Library('elf', [m4env.SharedObject(f) for f in elf_files])
 m4env.Command(File('native-elf-format.h'), File('native-elf-format'),
              '${SOURCE} > ${TARGET}')

-env.Prepend(CPPPATH=Dir('.'))
+env.Prepend(CPPPATH=Dir('.').srcnode())
 env.Append(LIBS=[File('libelf.a')])
 env.Prepend(LIBPATH=[Dir('.')])
--- a/ext/libfdt/SConscript
+++ b/ext/libfdt/SConscript
@@ -44,6 +44,6 @@ FdtFile('fdt_empty_tree.c')
 FdtFile('fdt_strerror.c')

 env.Library('fdt', [env.SharedObject(f) for f in fdt_files])
-env.Prepend(CPPPATH=Dir('.'))
+env.Prepend(CPPPATH=Dir('.').srcnode())
 env.Append(LIBS=['fdt'])
 env.Prepend(LIBPATH=[Dir('.')])
--- a/ext/nomali/SConscript
+++ b/ext/nomali/SConscript
@@ -39,7 +39,7 @@

 Import('env')

-env.Prepend(CPPPATH=Dir('./include'))
+env.Prepend(CPPPATH=Dir('include').srcnode())

 nomali = env.Clone()
 nomali.Append(CCFLAGS=['-Wno-ignored-qualifiers'])
--- a/ext/pybind11/.appveyor.yml
+++ b/ext/pybind11/.appveyor.yml
@@ -1,6 +1,6 @@
 version: 1.0.{build}
 image:
- Visual Studio 2015
+- Visual Studio 2017
 test: off
 skip_branch_with_pr: true
 build:
@@ -11,11 +11,9 @@ environment:
  matrix:
  - PYTHON: 36
    CONFIG: Debug
-  - PYTHON: 27
-    CONFIG: Debug
 install:
 - ps: |
-    $env:CMAKE_GENERATOR = "Visual Studio 14 2015"
+    $env:CMAKE_GENERATOR = "Visual Studio 15 2017"
    if ($env:PLATFORM -eq "x64") { $env:PYTHON = "$env:PYTHON-x64" }
    $env:PATH = "C:\Python$env:PYTHON\;C:\Python$env:PYTHON\Scripts\;$env:PATH"
    python -W ignore -m pip install --upgrade pip wheel
--- a/ext/pybind11/.clang-format
+++ b/ext/pybind11/.clang-format
@@ -3,19 +3,36 @@
 # clang-format --style=llvm --dump-config
 BasedOnStyle: LLVM
 AccessModifierOffset: -4
-AlignConsecutiveAssignments: true
+AllowShortLambdasOnASingleLine: true
 AlwaysBreakTemplateDeclarations: Yes
 BinPackArguments: false
 BinPackParameters: false
 BreakBeforeBinaryOperators: All
 BreakConstructorInitializers: BeforeColon
 ColumnLimit: 99
+CommentPragmas: 'NOLINT:.*|^ IWYU pragma:'
+IncludeBlocks: Regroup
 IndentCaseLabels: true
 IndentPPDirectives: AfterHash
 IndentWidth: 4
 Language: Cpp
 SpaceAfterCStyleCast: true
-# SpaceInEmptyBlock: true # too new
 Standard: Cpp11
+StatementMacros: ['PyObject_HEAD']
 TabWidth: 4
+IncludeCategories:
+  - Regex:           '<pybind11/.*'
+    Priority:        -1
+  - Regex:           'pybind11.h"$'
+    Priority:        1
+  - Regex:           '^".*/?detail/'
+    Priority:        1
+    SortPriority:    2
+  - Regex:           '^"'
+    Priority:        1
+    SortPriority:    3
+  - Regex:           '<[[:alnum:]._]+>'
+    Priority:        4
+  - Regex:           '.*'
+    Priority:        5
 ...
--- a/ext/pybind11/.clang-tidy
+++ b/ext/pybind11/.clang-tidy
@@ -1,13 +1,77 @@
 FormatStyle: file

-Checks: '
-llvm-namespace-comment,
-modernize-use-override,
-readability-container-size-empty,
-modernize-use-using,
-modernize-use-equals-default,
-modernize-use-auto,
-modernize-use-emplace,
-'
+Checks: |
+  *bugprone*,
+  *performance*,
+  clang-analyzer-optin.cplusplus.VirtualCall,
+  clang-analyzer-optin.performance.Padding,
+  cppcoreguidelines-init-variables,
+  cppcoreguidelines-prefer-member-initializer,
+  cppcoreguidelines-pro-type-static-cast-downcast,
+  cppcoreguidelines-slicing,
+  google-explicit-constructor,
+  llvm-namespace-comment,
+  misc-definitions-in-headers,
+  misc-misplaced-const,
+  misc-non-copyable-objects,
+  misc-static-assert,
+  misc-throw-by-value-catch-by-reference,
+  misc-uniqueptr-reset-release,
+  misc-unused-parameters,
+  modernize-avoid-bind,
+  modernize-loop-convert,
+  modernize-make-shared,
+  modernize-redundant-void-arg,
+  modernize-replace-auto-ptr,
+  modernize-replace-disallow-copy-and-assign-macro,
+  modernize-replace-random-shuffle,
+  modernize-shrink-to-fit,
+  modernize-use-auto,
+  modernize-use-bool-literals,
+  modernize-use-default-member-init,
+  modernize-use-emplace,
+  modernize-use-equals-default,
+  modernize-use-equals-delete,
+  modernize-use-noexcept,
+  modernize-use-nullptr,
+  modernize-use-override,
+  modernize-use-using,
+  readability-avoid-const-params-in-decls,
+  readability-braces-around-statements,
+  readability-const-return-type,
+  readability-container-size-empty,
+  readability-delete-null-pointer,
+  readability-else-after-return,
+  readability-implicit-bool-conversion,
+  readability-inconsistent-declaration-parameter-name,
+  readability-make-member-function-const,
+  readability-misplaced-array-index,
+  readability-non-const-parameter,
+  readability-qualified-auto,
+  readability-redundant-function-ptr-dereference,
+  readability-redundant-smartptr-get,
+  readability-redundant-string-cstr,
+  readability-simplify-subscript-expr,
+  readability-static-accessed-through-instance,
+  readability-static-definition-in-anonymous-namespace,
+  readability-string-compare,
+  readability-suspicious-call-argument,
+  readability-uniqueptr-delete-release,
+  -bugprone-easily-swappable-parameters,
+  -bugprone-exception-escape,
+  -bugprone-reserved-identifier,
+  -bugprone-unused-raii,
+
+CheckOptions:
+- key:             modernize-use-equals-default.IgnoreMacros
+  value:           false
+- key:             performance-for-range-copy.WarnOnAllAutoCopies
+  value:           true
+- key:             performance-inefficient-string-concatenation.StrictMode
+  value:           true
+- key:             performance-unnecessary-value-param.AllowedTypes
+  value:           'exception_ptr$;'
+- key:             readability-implicit-bool-conversion.AllowPointerConditions
+  value:           true

 HeaderFilterRegex: 'pybind11/.*h'
--- a/ext/pybind11/.codespell-ignore-lines
+++ b/ext/pybind11/.codespell-ignore-lines
@@ -0,0 +1,24 @@
+template <op_id id, op_type ot, typename L = undefined_t, typename R = undefined_t>
+    template <typename ThisT>
+        auto &this_ = static_cast<ThisT &>(*this);
+                if (load_impl<ThisT>(temp, false)) {
+        ssize_t nd = 0;
+        auto trivial = broadcast(buffers, nd, shape);
+        auto ndim = (size_t) nd;
+    int nd;
+    ssize_t ndim() const { return detail::array_proxy(m_ptr)->nd; }
+        using op = op_impl<id, ot, Base, L_type, R_type>;
+template <op_id id, op_type ot, typename L, typename R>
+    template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
+    class_ &def(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
+    class_ &def_cast(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
+@pytest.mark.parametrize("access", ["ro", "rw", "static_ro", "static_rw"])
+struct IntStruct {
+    explicit IntStruct(int v) : value(v){};
+    ~IntStruct() { value = -value; }
+    IntStruct(const IntStruct &) = default;
+    IntStruct &operator=(const IntStruct &) = default;
+    py::class_<IntStruct>(m, "IntStruct").def(py::init([](const int i) { return IntStruct(i); }));
+    py::implicitly_convertible<int, IntStruct>();
+    m.def("test", [](int expected, const IntStruct &in) {
+        [](int expected, const IntStruct &in) {
--- a/ext/pybind11/.gitattributes
+++ b/ext/pybind11/.gitattributes
@@ -0,0 +1 @@
+docs/*.svg binary
--- a/ext/pybind11/.github/CODEOWNERS
+++ b/ext/pybind11/.github/CODEOWNERS
@@ -0,0 +1,9 @@
+*.cmake @henryiii
+CMakeLists.txt @henryiii
+*.yml @henryiii
+*.yaml @henryiii
+/tools/ @henryiii
+/pybind11/ @henryiii
+noxfile.py @henryiii
+.clang-format @henryiii
+.clang-tidy @henryiii
--- a/ext/pybind11/.github/CONTRIBUTING.md
+++ b/ext/pybind11/.github/CONTRIBUTING.md
@@ -53,6 +53,33 @@ derivative works thereof, in binary and source code form.

 ## Development of pybind11

+### Quick setup
+
+To setup a quick development environment, use [`nox`](https://nox.thea.codes).
+This will allow you to do some common tasks with minimal setup effort, but will
+take more time to run and be less flexible than a full development environment.
+If you use [`pipx run nox`](https://pipx.pypa.io), you don't even need to
+install `nox`. Examples:
+
+```bash
+# List all available sessions
+nox -l
+
+# Run linters
+nox -s lint
+
+# Run tests on Python 3.9
+nox -s tests-3.9
+
+# Build and preview docs
+nox -s docs -- serve
+
+# Build SDists and wheels
+nox -s build
+```
+
+### Full setup
+
 To setup an ideal development environment, run the following commands on a
 system with CMake 3.14+:

@@ -66,11 +93,10 @@ cmake --build build -j4

 Tips:

-* You can use `virtualenv` (from PyPI) instead of `venv` (which is Python 3
-  only).
+* You can use `virtualenv` (faster, from PyPI) instead of `venv`.
 * You can select any name for your environment folder; if it contains "env" it
  will be ignored by git.
-* If you don’t have CMake 3.14+, just add “cmake” to the pip install command.
+* If you don't have CMake 3.14+, just add "cmake" to the pip install command.
 * You can use `-DPYBIND11_FINDPYTHON=ON` to use FindPython on CMake 3.12+
 * In classic mode, you may need to set `-DPYTHON_EXECUTABLE=/path/to/python`.
  FindPython uses `-DPython_ROOT_DIR=/path/to` or
@@ -78,7 +104,7 @@ Tips:

 ### Configuration options

-In CMake, configuration options are given with “-D”. Options are stored in the
+In CMake, configuration options are given with "-D". Options are stored in the
 build directory, in the `CMakeCache.txt` file, so they are remembered for each
 build directory. Two selections are special - the generator, given with `-G`,
 and the compiler, which is selected based on environment variables `CXX` and
@@ -88,12 +114,12 @@ after the initial run.
 The valid options are:

 * `-DCMAKE_BUILD_TYPE`: Release, Debug, MinSizeRel, RelWithDebInfo
-* `-DPYBIND11_FINDPYTHON=ON`: Use CMake 3.12+’s FindPython instead of the
+* `-DPYBIND11_FINDPYTHON=ON`: Use CMake 3.12+'s FindPython instead of the
  classic, deprecated, custom FindPythonLibs
 * `-DPYBIND11_NOPYTHON=ON`: Disable all Python searching (disables tests)
 * `-DBUILD_TESTING=ON`: Enable the tests
 * `-DDOWNLOAD_CATCH=ON`: Download catch to build the C++ tests
-* `-DOWNLOAD_EIGEN=ON`: Download Eigen for the NumPy tests
+* `-DDOWNLOAD_EIGEN=ON`: Download Eigen for the NumPy tests
 * `-DPYBIND11_INSTALL=ON/OFF`: Enable the install target (on by default for the
  master project)
 * `-DUSE_PYTHON_INSTALL_DIR=ON`: Try to install into the python dir
@@ -132,8 +158,9 @@ tests with these targets:
 * `test_cmake_build`: Install / subdirectory tests

 If you want to build just a subset of tests, use
-`-DPYBIND11_TEST_OVERRIDE="test_callbacks.cpp;test_pickling.cpp"`. If this is
-empty, all tests will be built.
+`-DPYBIND11_TEST_OVERRIDE="test_callbacks;test_pickling"`. If this is
+empty, all tests will be built. Tests are specified without an extension if they need both a .py and
+.cpp file.

 You may also pass flags to the `pytest` target by editing `tests/pytest.ini` or
 by using the `PYTEST_ADDOPTS` environment variable
@@ -203,16 +230,19 @@ of the pybind11 repo.
 [`clang-tidy`][clang-tidy] performs deeper static code analyses and is
 more complex to run, compared to `clang-format`, but support for `clang-tidy`
 is built into the pybind11 CMake configuration. To run `clang-tidy`, the
-following recipe should work. Files will be modified in place, so you can
-use git to monitor the changes.
+following recipe should work. Run the `docker` command from the top-level
+directory inside your pybind11 git clone. Files will be modified in place,
+so you can use git to monitor the changes.

 ```bash
-docker run --rm -v $PWD:/pybind11 -it silkeh/clang:10
-apt-get update && apt-get install python3-dev python3-pytest
-cmake -S pybind11/ -B build -DCMAKE_CXX_CLANG_TIDY="$(which clang-tidy);-fix"
-cmake --build build
+docker run --rm -v $PWD:/mounted_pybind11 -it silkeh/clang:13
+apt-get update && apt-get install -y python3-dev python3-pytest
+cmake -S /mounted_pybind11/ -B build -DCMAKE_CXX_CLANG_TIDY="$(which clang-tidy);--use-color" -DDOWNLOAD_EIGEN=ON -DDOWNLOAD_CATCH=ON -DCMAKE_CXX_STANDARD=17
+cmake --build build -j 2
 ```

+You can add `--fix` to the options list if you want.
+
 ### Include what you use

 To run include what you use, install (`brew install include-what-you-use` on
@@ -228,7 +258,7 @@ The report is sent to stderr; you can pipe it into a file if you wish.
 ### Build recipes

 This builds with the Intel compiler (assuming it is in your path, along with a
-recent CMake and Python 3):
+recent CMake and Python):

 ```bash
 python3 -m venv venv
--- a/ext/pybind11/.github/ISSUE_TEMPLATE/bug-report.md
+++ b/ext/pybind11/.github/ISSUE_TEMPLATE/bug-report.md
@@ -1,28 +0,0 @@
---
-name: Bug Report
-about: File an issue about a bug
-title: "[BUG] "
---
-
-
-Make sure you've completed the following steps before submitting your issue -- thank you!
-
-1. Make sure you've read the [documentation][]. Your issue may be addressed there.
-2. Search the [issue tracker][] to verify that this hasn't already been reported. +1 or comment there if it has.
-3. Consider asking first in the [Gitter chat room][].
-4. Include a self-contained and minimal piece of code that reproduces the problem. If that's not possible, try to make the description as clear as possible.
-    a. If possible, make a PR with a new, failing test to give us a starting point to work on!
-
-[documentation]: https://pybind11.readthedocs.io
-[issue tracker]: https://github.com/pybind/pybind11/issues
-[Gitter chat room]: https://gitter.im/pybind/Lobby
-
-*After reading, remove this checklist and the template text in parentheses below.*
-
-## Issue description
-
-(Provide a short description, state the expected behavior and what actually happens.)
-
-## Reproducible example code
-
-(The code should be minimal, have no external dependencies, isolate the function(s) that cause breakage. Submit matched and complete C++ and Python snippets that can be easily compiled and run to diagnose the issue.)
--- a/ext/pybind11/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/ext/pybind11/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -0,0 +1,61 @@
+name: Bug Report
+description: File an issue about a bug
+title: "[BUG]: "
+labels: [triage]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Please do your best to make the issue as easy to act on as possible, and only submit here if there is clearly a problem with pybind11 (ask first if unsure). **Note that a reproducer in a PR is much more likely to get immediate attention.**
+
+  - type: checkboxes
+    id: steps
+    attributes:
+      label: Required prerequisites
+      description: Make sure you've completed the following steps before submitting your issue -- thank you!
+      options:
+        - label: Make sure you've read the [documentation](https://pybind11.readthedocs.io). Your issue may be addressed there.
+          required: true
+        - label: Search the [issue tracker](https://github.com/pybind/pybind11/issues) and [Discussions](https:/pybind/pybind11/discussions) to verify that this hasn't already been reported. +1 or comment there if it has.
+          required: true
+        - label: Consider asking first in the [Gitter chat room](https://gitter.im/pybind/Lobby) or in a [Discussion](https:/pybind/pybind11/discussions/new).
+          required: false
+
+  - type: input
+    id: version
+    attributes:
+      label: What version (or hash if on master) of pybind11 are you using?
+    validations:
+      required: true
+
+  - type: textarea
+    id: description
+    attributes:
+      label: Problem description
+      placeholder: >-
+        Provide a short description, state the expected behavior and what
+        actually happens. Include relevant information like what version of
+        pybind11 you are using, what system you are on, and any useful commands
+        / output.
+    validations:
+      required: true
+
+  - type: textarea
+    id: code
+    attributes:
+      label: Reproducible example code
+      placeholder: >-
+        The code should be minimal, have no external dependencies, isolate the
+        function(s) that cause breakage. Submit matched and complete C++ and
+        Python snippets that can be easily compiled and run to diagnose the
+        issue. — Note that a reproducer in a PR is much more likely to get
+        immediate attention: failing tests in the pybind11 CI are the best
+        starting point for working out fixes.
+      render: text
+
+  - type: input
+    id: regression
+    attributes:
+      label: Is this a regression? Put the last known working version here if it is.
+      description: Put the last known working version here if this is a regression.
+      value: Not a regression
--- a/ext/pybind11/.github/ISSUE_TEMPLATE/config.yml
+++ b/ext/pybind11/.github/ISSUE_TEMPLATE/config.yml
@@ -1,5 +1,8 @@
 blank_issues_enabled: false
 contact_links:
+  - name: Ask a question
+    url: https://github.com/pybind/pybind11/discussions/new
+    about: Please ask and answer questions here, or propose new ideas.
  - name: Gitter room
    url: https://gitter.im/pybind/Lobby
    about: A room for discussing pybind11 with an active community
--- a/ext/pybind11/.github/ISSUE_TEMPLATE/feature-request.md
+++ b/ext/pybind11/.github/ISSUE_TEMPLATE/feature-request.md
@@ -1,16 +0,0 @@
---
-name: Feature Request
-about: File an issue about adding a feature
-title: "[FEAT] "
---
-
-
-Make sure you've completed the following steps before submitting your issue -- thank you!
-
-1. Check if your feature has already been mentioned / rejected / planned in other issues.
-2. If those resources didn't help, consider asking in the [Gitter chat room][] to see if this is interesting / useful to a larger audience and possible to implement reasonably,
-4. If you have a useful feature that passes the previous items (or not suitable for chat), please fill in the details below.
-
-[Gitter chat room]: https://gitter.im/pybind/Lobby
-
-*After reading, remove this checklist.*
--- a/ext/pybind11/.github/ISSUE_TEMPLATE/question.md
+++ b/ext/pybind11/.github/ISSUE_TEMPLATE/question.md
@@ -1,21 +0,0 @@
---
-name: Question
-about: File an issue about unexplained behavior
-title: "[QUESTION] "
---
-
-If you have a question, please check the following first:
-
-1. Check if your question has already been answered in the [FAQ][] section.
-2. Make sure you've read the [documentation][]. Your issue may be addressed there.
-3. If those resources didn't help and you only have a short question (not a bug report), consider asking in the [Gitter chat room][]
-4. Search the [issue tracker][], including the closed issues, to see if your question has already been asked/answered. +1 or comment if it has been asked but has no answer.
-5. If you have a more complex question which is not answered in the previous items (or not suitable for chat), please fill in the details below.
-6. Include a self-contained and minimal piece of code that illustrates your question. If that's not possible, try to make the description as clear as possible.
-
-[FAQ]: http://pybind11.readthedocs.io/en/latest/faq.html
-[documentation]: https://pybind11.readthedocs.io
-[issue tracker]: https://github.com/pybind/pybind11/issues
-[Gitter chat room]: https://gitter.im/pybind/Lobby
-
-*After reading, remove this checklist.*
--- a/ext/pybind11/.github/dependabot.yml
+++ b/ext/pybind11/.github/dependabot.yml
@@ -5,12 +5,3 @@ updates:
    directory: "/"
    schedule:
      interval: "daily"
-    ignore:
-      # Official actions have moving tags like v1
-      # that are used, so they don't need updates here
-      - dependency-name: "actions/checkout"
-      - dependency-name: "actions/setup-python"
-      - dependency-name: "actions/cache"
-      - dependency-name: "actions/upload-artifact"
-      - dependency-name: "actions/download-artifact"
-      - dependency-name: "actions/labeler"
--- a/ext/pybind11/.github/matchers/pylint.json
+++ b/ext/pybind11/.github/matchers/pylint.json
@@ -0,0 +1,32 @@
+{
+  "problemMatcher": [
+    {
+      "severity": "warning",
+      "pattern": [
+        {
+          "regexp": "^([^:]+):(\\d+):(\\d+): ([A-DF-Z]\\d+): \\033\\[[\\d;]+m([^\\033]+).*$",
+          "file": 1,
+          "line": 2,
+          "column": 3,
+          "code": 4,
+          "message": 5
+        }
+      ],
+      "owner": "pylint-warning"
+    },
+    {
+      "severity": "error",
+      "pattern": [
+        {
+          "regexp": "^([^:]+):(\\d+):(\\d+): (E\\d+): \\033\\[[\\d;]+m([^\\033]+).*$",
+          "file": 1,
+          "line": 2,
+          "column": 3,
+          "code": 4,
+          "message": 5
+        }
+      ],
+      "owner": "pylint-error"
+    }
+  ]
+}
--- a/ext/pybind11/.github/pull_request_template.md
+++ b/ext/pybind11/.github/pull_request_template.md
@@ -1,3 +1,7 @@
+<!--
+Title (above): please place [branch_name] at the beginning if you are targeting a branch other than master. *Do not target stable*.
+It is recommended to use conventional commit format, see conventionalcommits.org, but not required.
+-->
 ## Description

 <!-- Include relevant issues or PRs here, describe what changed and why -->
--- a/ext/pybind11/.github/workflows/ci.yml
+++ b/ext/pybind11/.github/workflows/ci.yml
@@ -9,6 +9,17 @@ on:
      - stable
      - v*

+concurrency:
+  group: test-${{ github.ref }}
+  cancel-in-progress: true
+
+env:
+  PIP_ONLY_BINARY: numpy
+  FORCE_COLOR: 3
+  PYTEST_TIMEOUT: 300
+  # For cmake:
+  VERBOSE: 1
+
 jobs:
  # This is the "main" test suite, which tests a large number of different
  # versions of default compilers and Python versions in GitHub Actions.
@@ -16,66 +27,66 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        runs-on: [ubuntu-latest, windows-latest, macos-latest]
+        runs-on: [ubuntu-20.04, windows-2022, macos-latest]
        python:
-        - 2.7
-        - 3.5
-        - 3.6
-        - 3.9
-        # - 3.10-dev  # Re-enable once 3.10.0a5 is released
-        - pypy2
-        - pypy3
+        - '3.6'
+        - '3.9'
+        - '3.10'
+        - '3.11'
+        - 'pypy-3.7'
+        - 'pypy-3.8'
+        - 'pypy-3.9'

        # Items in here will either be added to the build matrix (if not
        # present), or add new keys to an existing matrix element if all the
        # existing keys match.
        #
-        # We support three optional keys: args (both build), args1 (first
-        # build), and args2 (second build).
+        # We support an optional key: args, for cmake args
        include:
          # Just add a key
-          - runs-on: ubuntu-latest
-            python: 3.6
+          - runs-on: ubuntu-20.04
+            python: '3.6'
            args: >
              -DPYBIND11_FINDPYTHON=ON
-          - runs-on: windows-latest
-            python: 3.6
+              -DCMAKE_CXX_FLAGS="-D_=1"
+          - runs-on: ubuntu-20.04
+            python: 'pypy-3.8'
            args: >
              -DPYBIND11_FINDPYTHON=ON
-
-        # These items will be removed from the build matrix, keys must match.
-        exclude:
-            # Currently 32bit only, and we build 64bit
-          - runs-on: windows-latest
-            python: pypy2
-          - runs-on: windows-latest
-            python: pypy3
-
-          # TODO: PyPy2 7.3.3 segfaults, while 7.3.2 was fine.
-          - runs-on: ubuntu-latest
-            python: pypy2
+          - runs-on: windows-2019
+            python: '3.6'
+            args: >
+              -DPYBIND11_FINDPYTHON=ON
+          # Inject a couple Windows 2019 runs
+          - runs-on: windows-2019
+            python: '3.9'

    name: "🐍 ${{ matrix.python }} • ${{ matrix.runs-on }} • x64 ${{ matrix.args }}"
    runs-on: ${{ matrix.runs-on }}

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

    - name: Setup Python ${{ matrix.python }}
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python }}

-    - name: Setup Boost (Windows / Linux latest)
-      shell: bash
-      run: echo "BOOST_ROOT=$BOOST_ROOT_1_72_0" >> $GITHUB_ENV
+    - name: Setup Boost (Linux)
+      # Can't use boost + define _
+      if: runner.os == 'Linux' && matrix.python != '3.6'
+      run: sudo apt-get install libboost-dev
+
+    - name: Setup Boost (macOS)
+      if: runner.os == 'macOS'
+      run: brew install boost

    - name: Update CMake
-      uses: jwlawson/actions-setup-cmake@v1.7
+      uses: jwlawson/actions-setup-cmake@v1.13

    - name: Cache wheels
      if: runner.os == 'macOS'
-      uses: actions/cache@v2
+      uses: actions/cache@v3
      with:
        # This path is specific to macOS - we really only need it for PyPy NumPy wheels
        # See https://github.com/actions/cache/blob/master/examples.md#python---pip
@@ -85,17 +96,20 @@ jobs:
        key: ${{ runner.os }}-pip-${{ matrix.python }}-x64-${{ hashFiles('tests/requirements.txt') }}

    - name: Prepare env
-      run: python -m pip install -r tests/requirements.txt --prefer-binary
+      run: |
+        python -m pip install -r tests/requirements.txt

    - name: Setup annotations on Linux
      if: runner.os == 'Linux'
      run: python -m pip install pytest-github-actions-annotate-failures

    # First build - C++11 mode and inplace
+    # More-or-less randomly adding -DPYBIND11_SIMPLE_GIL_MANAGEMENT=ON here.
    - name: Configure C++11 ${{ matrix.args }}
      run: >
        cmake -S . -B .
        -DPYBIND11_WERROR=ON
+        -DPYBIND11_SIMPLE_GIL_MANAGEMENT=ON
        -DDOWNLOAD_CATCH=ON
        -DDOWNLOAD_EIGEN=ON
        -DCMAKE_CXX_STANDARD=11
@@ -109,7 +123,7 @@ jobs:

    - name: C++11 tests
      # TODO: Figure out how to load the DLL on Python 3.8+
-      if: "!(runner.os == 'Windows' && (matrix.python == 3.8 || matrix.python == 3.9 || matrix.python == '3.10-dev'))"
+      if: "!(runner.os == 'Windows' && (matrix.python == 3.8 || matrix.python == 3.9 || matrix.python == '3.10' || matrix.python == '3.11' || matrix.python == 'pypy-3.8'))"
      run: cmake --build .  --target cpptest -j 2

    - name: Interface test C++11
@@ -119,15 +133,16 @@ jobs:
      run: git clean -fdx

    # Second build - C++17 mode and in a build directory
-    - name: Configure ${{ matrix.args2 }}
+    # More-or-less randomly adding -DPYBIND11_SIMPLE_GIL_MANAGEMENT=OFF here.
+    - name: Configure C++17
      run: >
        cmake -S . -B build2
        -DPYBIND11_WERROR=ON
+        -DPYBIND11_SIMPLE_GIL_MANAGEMENT=OFF
        -DDOWNLOAD_CATCH=ON
        -DDOWNLOAD_EIGEN=ON
        -DCMAKE_CXX_STANDARD=17
        ${{ matrix.args }}
-        ${{ matrix.args2 }}

    - name: Build
      run: cmake --build build2 -j 2
@@ -137,32 +152,35 @@ jobs:

    - name: C++ tests
      # TODO: Figure out how to load the DLL on Python 3.8+
-      if: "!(runner.os == 'Windows' && (matrix.python == 3.8 || matrix.python == 3.9 || matrix.python == '3.10-dev'))"
+      if: "!(runner.os == 'Windows' && (matrix.python == 3.8 || matrix.python == 3.9 || matrix.python == '3.10' || matrix.python == '3.11' || matrix.python == 'pypy-3.8'))"
      run: cmake --build build2 --target cpptest

+    # Third build - C++17 mode with unstable ABI
+    - name: Configure (unstable ABI)
+      run: >
+        cmake -S . -B build3
+        -DPYBIND11_WERROR=ON
+        -DDOWNLOAD_CATCH=ON
+        -DDOWNLOAD_EIGEN=ON
+        -DCMAKE_CXX_STANDARD=17
+        -DPYBIND11_INTERNALS_VERSION=10000000
+        "-DPYBIND11_TEST_OVERRIDE=test_call_policies.cpp;test_gil_scoped.cpp;test_thread.cpp"
+        ${{ matrix.args }}
+
+    - name: Build (unstable ABI)
+      run: cmake --build build3 -j 2
+
+    - name: Python tests (unstable ABI)
+      run: cmake --build build3 --target pytest
+
    - name: Interface test
      run: cmake --build build2 --target test_cmake_build

-    # Eventually Microsoft might have an action for setting up
-    # MSVC, but for now, this action works:
-    - name: Prepare compiler environment for Windows 🐍 2.7
-      if: matrix.python == 2.7 && runner.os == 'Windows'
-      uses: ilammy/msvc-dev-cmd@v1
-      with:
-        arch: x64
-
-    # This makes two environment variables available in the following step(s)
-    - name: Set Windows 🐍 2.7 environment variables
-      if: matrix.python == 2.7 && runner.os == 'Windows'
-      shell: bash
-      run: |
-        echo "DISTUTILS_USE_SDK=1" >> $GITHUB_ENV
-        echo "MSSdk=1" >> $GITHUB_ENV
-
    # This makes sure the setup_helpers module can build packages using
    # setuptools
    - name: Setuptools helpers test
      run: pytest tests/extra_setuptools
+      if: "!(matrix.runs-on == 'windows-2022')"


  deadsnakes:
@@ -170,30 +188,31 @@ jobs:
      fail-fast: false
      matrix:
        include:
-        - python-version: 3.9
+        # TODO: Fails on 3.10, investigate
+        - python-version: "3.9"
          python-debug: true
          valgrind: true
-        - python-version: 3.10-dev
+        - python-version: "3.11"
          python-debug: false

    name: "🐍 ${{ matrix.python-version }}${{ matrix.python-debug && '-dbg' || '' }} (deadsnakes)${{ matrix.valgrind && ' • Valgrind' || '' }} • x64"
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-20.04

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

    - name: Setup Python ${{ matrix.python-version }} (deadsnakes)
-      uses: deadsnakes/action@v2.1.1
+      uses: deadsnakes/action@v3.0.0
      with:
        python-version: ${{ matrix.python-version }}
        debug: ${{ matrix.python-debug }}

    - name: Update CMake
-      uses: jwlawson/actions-setup-cmake@v1.7
+      uses: jwlawson/actions-setup-cmake@v1.13

    - name: Valgrind cache
      if: matrix.valgrind
-      uses: actions/cache@v2
+      uses: actions/cache@v3
      id: cache-valgrind
      with:
        path: valgrind
@@ -218,9 +237,12 @@ jobs:
        sudo apt-get install libc6-dbg  # Needed by Valgrind

    - name: Prepare env
-      run: python -m pip install -r tests/requirements.txt --prefer-binary
+      run: |
+        python -m pip install -r tests/requirements.txt

    - name: Configure
+      env:
+        SETUPTOOLS_USE_DISTUTILS: stdlib
      run: >
        cmake -S . -B build
        -DCMAKE_BUILD_TYPE=Debug
@@ -261,16 +283,22 @@ jobs:
        include:
          - clang: 5
            std: 14
-          - clang: 10
-            std: 20
          - clang: 10
            std: 17
+          - clang: 11
+            std: 20
+          - clang: 12
+            std: 20
+          - clang: 13
+            std: 20
+          - clang: 14
+            std: 20

    name: "🐍 3 • Clang ${{ matrix.clang }} • C++${{ matrix.std }} • x64"
    container: "silkeh/clang:${{ matrix.clang }}"

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

    - name: Add wget and python3
      run: apt-get update && apt-get install -y python3-dev python3-numpy python3-pytest libeigen3-dev
@@ -300,11 +328,11 @@ jobs:
  # Testing NVCC; forces sources to behave like .cu files
  cuda:
    runs-on: ubuntu-latest
-    name: "🐍 3.8 • CUDA 11 • Ubuntu 20.04"
-    container: nvidia/cuda:11.0-devel-ubuntu20.04
+    name: "🐍 3.10 • CUDA 11.7 • Ubuntu 22.04"
+    container: nvidia/cuda:11.7.0-devel-ubuntu22.04

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

    # tzdata will try to ask for the timezone, so set the DEBIAN_FRONTEND
    - name: Install 🐍 3
@@ -328,7 +356,7 @@ jobs:
 #    container: centos:8
 #
 #    steps:
-#    - uses: actions/checkout@v2
+#    - uses: actions/checkout@v3
 #
 #    - name: Add Python 3 and a few requirements
 #      run: yum update -y && yum install -y git python3-devel python3-numpy python3-pytest make environment-modules
@@ -367,32 +395,32 @@ jobs:
  # Testing on CentOS 7 + PGI compilers, which seems to require more workarounds
  centos-nvhpc7:
    runs-on: ubuntu-latest
-    name: "🐍 3 • CentOS7 / PGI 20.9 • x64"
+    name: "🐍 3 • CentOS7 / PGI 22.9 • x64"
    container: centos:7

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

    - name: Add Python 3 and a few requirements
-      run: yum update -y && yum install -y epel-release && yum install -y git python3-devel make environment-modules cmake3
+      run: yum update -y && yum install -y epel-release && yum install -y git python3-devel make environment-modules cmake3 yum-utils

    - name: Install NVidia HPC SDK
-      run:  yum -y install https://developer.download.nvidia.com/hpc-sdk/20.9/nvhpc-20-9-20.9-1.x86_64.rpm https://developer.download.nvidia.com/hpc-sdk/20.9/nvhpc-2020-20.9-1.x86_64.rpm
+      run: yum-config-manager --add-repo https://developer.download.nvidia.com/hpc-sdk/rhel/nvhpc.repo && yum -y install nvhpc-22.9

    # On CentOS 7, we have to filter a few tests (compiler internal error)
-    # and allow deeper templete recursion (not needed on CentOS 8 with a newer
+    # and allow deeper template recursion (not needed on CentOS 8 with a newer
    # standard library). On some systems, you many need further workarounds:
    # https://github.com/pybind/pybind11/pull/2475
    - name: Configure
      shell: bash
      run: |
        source /etc/profile.d/modules.sh
-        module load /opt/nvidia/hpc_sdk/modulefiles/nvhpc/20.9
+        module load /opt/nvidia/hpc_sdk/modulefiles/nvhpc/22.9
        cmake3 -S . -B build -DDOWNLOAD_CATCH=ON \
                            -DCMAKE_CXX_STANDARD=11 \
                            -DPYTHON_EXECUTABLE=$(python3 -c "import sys; print(sys.executable)") \
                            -DCMAKE_CXX_FLAGS="-Wc,--pending_instantiations=0" \
-                            -DPYBIND11_TEST_FILTER="test_smart_ptr.cpp;test_virtual_functions.cpp"
+                            -DPYBIND11_TEST_FILTER="test_smart_ptr.cpp"

    # Building before installing Pip should produce a warning but not an error
    - name: Build
@@ -419,20 +447,20 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        gcc:
-          - 7
-          - latest
-        std:
-          - 11
        include:
-          - gcc: 10
-            std: 20
+          - { gcc: 7, std: 11 }
+          - { gcc: 7, std: 17 }
+          - { gcc: 8, std: 14 }
+          - { gcc: 8, std: 17 }
+          - { gcc: 10, std: 17 }
+          - { gcc: 11, std: 20 }
+          - { gcc: 12, std: 20 }

    name: "🐍 3 • GCC ${{ matrix.gcc }} • C++${{ matrix.std }}• x64"
    container: "gcc:${{ matrix.gcc }}"

    steps:
-    - uses: actions/checkout@v1
+    - uses: actions/checkout@v3

    - name: Add Python 3
      run: apt-get update; apt-get install -y python3-dev python3-numpy python3-pytest python3-pip libeigen3-dev
@@ -441,7 +469,7 @@ jobs:
      run: python3 -m pip install --upgrade pip

    - name: Update CMake
-      uses: jwlawson/actions-setup-cmake@v1.7
+      uses: jwlawson/actions-setup-cmake@v1.13

    - name: Configure
      shell: bash
@@ -474,7 +502,7 @@ jobs:
    name: "🐍 3 • ICC latest • x64"

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

    - name: Add apt repo
      run: |
@@ -495,7 +523,7 @@ jobs:
    - name: Install dependencies
      run: |
        set +e; source /opt/intel/oneapi/setvars.sh; set -e
-        python3 -m pip install -r tests/requirements.txt --prefer-binary
+        python3 -m pip install -r tests/requirements.txt

    - name: Configure C++11
      run: |
@@ -569,29 +597,37 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        centos:
-          - 7  # GCC 4.8
-          - 8
+        container:
+          - "centos:7"  # GCC 4.8
+          - "almalinux:8"
+          - "almalinux:9"

-    name: "🐍 3 • CentOS ${{ matrix.centos }} • x64"
-    container: "centos:${{ matrix.centos }}"
+    name: "🐍 3 • ${{ matrix.container }} • x64"
+    container: "${{ matrix.container }}"

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

-    - name: Add Python 3
+    - name: Add Python 3 (RHEL 7)
+      if: matrix.container == 'centos:7'
      run: yum update -y && yum install -y python3-devel gcc-c++ make git

+    - name: Add Python 3 (RHEL 8+)
+      if: matrix.container != 'centos:7'
+      run: dnf update -y && dnf install -y python3-devel gcc-c++ make git
+
    - name: Update pip
      run: python3 -m pip install --upgrade pip

    - name: Install dependencies
-      run: python3 -m pip install cmake -r tests/requirements.txt --prefer-binary
+      run: |
+        python3 -m pip install cmake -r tests/requirements.txt

    - name: Configure
      shell: bash
      run: >
        cmake -S . -B build
+        -DCMAKE_BUILD_TYPE=MinSizeRel
        -DPYBIND11_WERROR=ON
        -DDOWNLOAD_CATCH=ON
        -DDOWNLOAD_EIGEN=ON
@@ -613,18 +649,18 @@ jobs:

  # This tests an "install" with the CMake tools
  install-classic:
-    name: "🐍 3.5 • Debian • x86 •  Install"
+    name: "🐍 3.7 • Debian • x86 •  Install"
    runs-on: ubuntu-latest
-    container: i386/debian:stretch
+    container: i386/debian:buster

    steps:
-    - uses: actions/checkout@v1
+    - uses: actions/checkout@v1  # Required to run inside docker

    - name: Install requirements
      run: |
        apt-get update
        apt-get install -y git make cmake g++ libeigen3-dev python3-dev python3-pip
-        pip3 install "pytest==3.1.*"
+        pip3 install "pytest==6.*"

    - name: Configure for install
      run: >
@@ -649,33 +685,32 @@ jobs:
        -DPYTHON_EXECUTABLE=$(python3 -c "import sys; print(sys.executable)")
      working-directory: /build-tests

-    - name: Run tests
+    - name: Python tests
      run: make pytest -j 2
      working-directory: /build-tests


  # This verifies that the documentation is not horribly broken, and does a
-  # basic sanity check on the SDist.
+  # basic validation check on the SDist.
  doxygen:
    name: "Documentation build test"
    runs-on: ubuntu-latest

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

-    - uses: actions/setup-python@v2
+    - uses: actions/setup-python@v4
+      with:
+        python-version: "3.x"

    - name: Install Doxygen
      run: sudo apt-get install -y doxygen librsvg2-bin # Changed to rsvg-convert in 20.04

-    - name: Install docs & setup requirements
-      run: python3 -m pip install -r docs/requirements.txt
-
    - name: Build docs
-      run: python3 -m sphinx -W -b html docs docs/.build
+      run: pipx run nox -s docs

    - name: Make SDist
-      run: python3 setup.py sdist
+      run: pipx run nox -s build -- --sdist

    - run: git status --ignored

@@ -687,7 +722,7 @@ jobs:
    - name: Compare Dists (headers only)
      working-directory: include
      run: |
-        python3 -m pip install --user -U ../dist/*
+        python3 -m pip install --user -U ../dist/*.tar.gz
        installed=$(python3 -c "import pybind11; print(pybind11.get_include() + '/pybind11')")
        diff -rq $installed ./pybind11

@@ -696,42 +731,43 @@ jobs:
      fail-fast: false
      matrix:
        python:
-        - 3.5
        - 3.6
        - 3.7
        - 3.8
        - 3.9
-        - pypy3
-        # TODO: fix hang on pypy2

        include:
          - python: 3.9
-            args: -DCMAKE_CXX_STANDARD=20 -DDOWNLOAD_EIGEN=OFF
+            args: -DCMAKE_CXX_STANDARD=20
          - python: 3.8
            args: -DCMAKE_CXX_STANDARD=17
+          - python: 3.7
+            args: -DCMAKE_CXX_STANDARD=14
+

    name: "🐍 ${{ matrix.python }} • MSVC 2019 • x86 ${{ matrix.args }}"
-    runs-on: windows-latest
+    runs-on: windows-2019

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

    - name: Setup Python ${{ matrix.python }}
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python }}
        architecture: x86

    - name: Update CMake
-      uses: jwlawson/actions-setup-cmake@v1.7
+      uses: jwlawson/actions-setup-cmake@v1.13

    - name: Prepare MSVC
-      uses: ilammy/msvc-dev-cmd@v1
+      uses: ilammy/msvc-dev-cmd@v1.12.0
      with:
        arch: x86

    - name: Prepare env
-      run: python -m pip install -r tests/requirements.txt --prefer-binary
+      run: |
+        python -m pip install -r tests/requirements.txt

    # First build - C++11 mode and inplace
    - name: Configure ${{ matrix.args }}
@@ -745,102 +781,324 @@ jobs:
    - name: Build C++11
      run: cmake --build build -j 2

-    - name: Run tests
+    - name: Python tests
      run: cmake --build build -t pytest

-  win32-msvc2015:
-    name: "🐍 ${{ matrix.python }} • MSVC 2015 • x64"
-    runs-on: windows-latest
+  win32-debug:
    strategy:
      fail-fast: false
      matrix:
        python:
-          - 2.7
-          - 3.6
-          - 3.7
-          # todo: check/cpptest does not support 3.8+ yet
-
-    steps:
-    - uses: actions/checkout@v2
-
-    - name: Setup 🐍 ${{ matrix.python }}
-      uses: actions/setup-python@v2
-      with:
-        python-version: ${{ matrix.python }}
-
-    - name: Update CMake
-      uses: jwlawson/actions-setup-cmake@v1.7
-
-    - name: Prepare MSVC
-      uses: ilammy/msvc-dev-cmd@v1
-      with:
-        toolset: 14.0
-
-    - name: Prepare env
-      run: python -m pip install -r tests/requirements.txt --prefer-binary
-
-    # First build - C++11 mode and inplace
-    - name: Configure
-      run: >
-        cmake -S . -B build
-        -G "Visual Studio 14 2015" -A x64
-        -DPYBIND11_WERROR=ON
-        -DDOWNLOAD_CATCH=ON
-        -DDOWNLOAD_EIGEN=ON
-
-    - name: Build C++14
-      run: cmake --build build -j 2
-
-    - name: Run all checks
-      run: cmake --build build -t check
-
-
-  win32-msvc2017:
-    name: "🐍 ${{ matrix.python }} • MSVC 2017 • x64"
-    runs-on: windows-2016
-    strategy:
-      fail-fast: false
-      matrix:
-        python:
-          - 2.7
-          - 3.5
-          - 3.7
-        std:
-          - 14
+        - 3.8
+        - 3.9

        include:
-          - python: 2.7
-            std: 17
-            args: >
-              -DCMAKE_CXX_FLAGS="/permissive- /EHsc /GR"
+          - python: 3.9
+            args: -DCMAKE_CXX_STANDARD=20
+          - python: 3.8
+            args: -DCMAKE_CXX_STANDARD=17
+
+    name: "🐍 ${{ matrix.python }} • MSVC 2019 (Debug) • x86 ${{ matrix.args }}"
+    runs-on: windows-2019

    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3

-    - name: Setup 🐍 ${{ matrix.python }}
-      uses: actions/setup-python@v2
+    - name: Setup Python ${{ matrix.python }}
+      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python }}
+        architecture: x86

    - name: Update CMake
-      uses: jwlawson/actions-setup-cmake@v1.7
+      uses: jwlawson/actions-setup-cmake@v1.13
+
+    - name: Prepare MSVC
+      uses: ilammy/msvc-dev-cmd@v1.12.0
+      with:
+        arch: x86

    - name: Prepare env
-      run: python -m pip install -r tests/requirements.txt --prefer-binary
+      run: |
+        python -m pip install -r tests/requirements.txt

    # First build - C++11 mode and inplace
-    - name: Configure
+    - name: Configure ${{ matrix.args }}
      run: >
        cmake -S . -B build
-        -G "Visual Studio 15 2017" -A x64
+        -G "Visual Studio 16 2019" -A Win32
+        -DCMAKE_BUILD_TYPE=Debug
        -DPYBIND11_WERROR=ON
        -DDOWNLOAD_CATCH=ON
        -DDOWNLOAD_EIGEN=ON
-        -DCMAKE_CXX_STANDARD=${{ matrix.std }}
        ${{ matrix.args }}
+    - name: Build C++11
+      run: cmake --build build --config Debug -j 2

-    - name: Build ${{ matrix.std }}
+    - name: Python tests
+      run: cmake --build build --config Debug -t pytest
+
+
+  windows-2022:
+    strategy:
+      fail-fast: false
+      matrix:
+        python:
+        - 3.9
+
+    name: "🐍 ${{ matrix.python }} • MSVC 2022 C++20 • x64"
+    runs-on: windows-2022
+
+    steps:
+    - uses: actions/checkout@v3
+
+    - name: Setup Python ${{ matrix.python }}
+      uses: actions/setup-python@v4
+      with:
+        python-version: ${{ matrix.python }}
+
+    - name: Prepare env
+      run: |
+        python3 -m pip install -r tests/requirements.txt
+
+    - name: Update CMake
+      uses: jwlawson/actions-setup-cmake@v1.13
+
+    - name: Configure C++20
+      run: >
+        cmake -S . -B build
+        -DPYBIND11_WERROR=ON
+        -DDOWNLOAD_CATCH=ON
+        -DDOWNLOAD_EIGEN=ON
+        -DCMAKE_CXX_STANDARD=20
+
+    - name: Build C++20
      run: cmake --build build -j 2

-    - name: Run all checks
-      run: cmake --build build -t check
+    - name: Python tests
+      run: cmake --build build --target pytest
+
+    - name: C++20 tests
+      run: cmake --build build --target cpptest -j 2
+
+    - name: Interface test C++20
+      run: cmake --build build --target test_cmake_build
+
+  mingw:
+    name: "🐍 3 • windows-latest • ${{ matrix.sys }}"
+    runs-on: windows-latest
+    defaults:
+      run:
+        shell: msys2 {0}
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - { sys: mingw64, env: x86_64 }
+          - { sys: mingw32, env: i686 }
+    steps:
+    - uses: msys2/setup-msys2@v2
+      with:
+        msystem: ${{matrix.sys}}
+        install: >-
+          git
+          mingw-w64-${{matrix.env}}-gcc
+          mingw-w64-${{matrix.env}}-python-pip
+          mingw-w64-${{matrix.env}}-python-numpy
+          mingw-w64-${{matrix.env}}-python-scipy
+          mingw-w64-${{matrix.env}}-cmake
+          mingw-w64-${{matrix.env}}-make
+          mingw-w64-${{matrix.env}}-python-pytest
+          mingw-w64-${{matrix.env}}-eigen3
+          mingw-w64-${{matrix.env}}-boost
+          mingw-w64-${{matrix.env}}-catch
+
+    - uses: actions/checkout@v3
+
+    - name: Configure C++11
+      # LTO leads to many undefined reference like
+      # `pybind11::detail::function_call::function_call(pybind11::detail::function_call&&)
+      run: cmake -G "MinGW Makefiles" -DCMAKE_CXX_STANDARD=11 -DPYBIND11_WERROR=ON -DDOWNLOAD_CATCH=ON -S . -B build
+
+    - name: Build C++11
+      run: cmake --build build -j 2
+
+    - name: Python tests C++11
+      run: cmake --build build --target pytest -j 2
+
+    - name: C++11 tests
+      run: PYTHONHOME=/${{matrix.sys}} PYTHONPATH=/${{matrix.sys}} cmake --build build --target cpptest -j 2
+
+    - name: Interface test C++11
+      run: PYTHONHOME=/${{matrix.sys}} PYTHONPATH=/${{matrix.sys}} cmake --build build --target test_cmake_build
+
+    - name: Clean directory
+      run: git clean -fdx
+
+    - name: Configure C++14
+      run: cmake -G "MinGW Makefiles" -DCMAKE_CXX_STANDARD=14 -DPYBIND11_WERROR=ON -DDOWNLOAD_CATCH=ON -S . -B build2
+
+    - name: Build C++14
+      run: cmake --build build2 -j 2
+
+    - name: Python tests C++14
+      run: cmake --build build2 --target pytest -j 2
+
+    - name: C++14 tests
+      run: PYTHONHOME=/${{matrix.sys}} PYTHONPATH=/${{matrix.sys}} cmake --build build2 --target cpptest -j 2
+
+    - name: Interface test C++14
+      run: PYTHONHOME=/${{matrix.sys}} PYTHONPATH=/${{matrix.sys}} cmake --build build2 --target test_cmake_build
+
+    - name: Clean directory
+      run: git clean -fdx
+
+    - name: Configure C++17
+      run: cmake -G "MinGW Makefiles" -DCMAKE_CXX_STANDARD=17 -DPYBIND11_WERROR=ON -DDOWNLOAD_CATCH=ON -S . -B build3
+
+    - name: Build C++17
+      run: cmake --build build3 -j 2
+
+    - name: Python tests C++17
+      run: cmake --build build3 --target pytest -j 2
+
+    - name: C++17 tests
+      run: PYTHONHOME=/${{matrix.sys}} PYTHONPATH=/${{matrix.sys}} cmake --build build3 --target cpptest -j 2
+
+    - name: Interface test C++17
+      run: PYTHONHOME=/${{matrix.sys}} PYTHONPATH=/${{matrix.sys}} cmake --build build3 --target test_cmake_build
+
+  windows_clang:
+
+    strategy:
+      matrix:
+        os: [windows-latest]
+        python: ['3.10']
+
+    runs-on: "${{ matrix.os }}"
+
+    name: "🐍 ${{ matrix.python }} • ${{ matrix.os }} • clang-latest"
+
+    steps:
+      - name: Show env
+        run: env
+
+      - name: Checkout
+        uses: actions/checkout@v3
+
+      - name: Set up Clang
+        uses: egor-tensin/setup-clang@v1
+
+      - name: Setup Python ${{ matrix.python }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python }}
+
+      - name: Update CMake
+        uses: jwlawson/actions-setup-cmake@v1.13
+
+      - name: Install ninja-build tool
+        uses: seanmiddleditch/gha-setup-ninja@v3
+
+      - name: Run pip installs
+        run: |
+          python -m pip install --upgrade pip
+          python -m pip install -r tests/requirements.txt
+
+      - name: Show Clang++ version
+        run: clang++ --version
+
+      - name: Show CMake version
+        run: cmake --version
+
+      # TODO: WERROR=ON
+      - name: Configure Clang
+        run: >
+          cmake -G Ninja -S . -B .
+          -DPYBIND11_WERROR=OFF
+          -DPYBIND11_SIMPLE_GIL_MANAGEMENT=OFF
+          -DDOWNLOAD_CATCH=ON
+          -DDOWNLOAD_EIGEN=ON
+          -DCMAKE_CXX_COMPILER=clang++
+          -DCMAKE_CXX_STANDARD=17
+
+      - name: Build
+        run: cmake --build . -j 2
+
+      - name: Python tests
+        run: cmake --build . --target pytest -j 2
+
+      - name: C++ tests
+        run: cmake --build . --target cpptest -j 2
+
+      - name: Interface test
+        run: cmake --build . --target test_cmake_build -j 2
+
+      - name: Clean directory
+        run: git clean -fdx
+
+  macos_brew_install_llvm:
+    name: "macos-latest • brew install llvm"
+    runs-on: macos-latest
+
+    env:
+      # https://apple.stackexchange.com/questions/227026/how-to-install-recent-clang-with-homebrew
+      LDFLAGS: '-L/usr/local/opt/llvm/lib -Wl,-rpath,/usr/local/opt/llvm/lib'
+
+    steps:
+      - name: Update PATH
+        run: echo "/usr/local/opt/llvm/bin" >> $GITHUB_PATH
+
+      - name: Show env
+        run: env
+
+      - name: Checkout
+        uses: actions/checkout@v3
+
+      - name: Show Clang++ version before brew install llvm
+        run: clang++ --version
+
+      - name: brew install llvm
+        run: brew install llvm
+
+      - name: Show Clang++ version after brew install llvm
+        run: clang++ --version
+
+      - name: Update CMake
+        uses: jwlawson/actions-setup-cmake@v1.13
+
+      - name: Run pip installs
+        run: |
+          python3 -m pip install --upgrade pip
+          python3 -m pip install -r tests/requirements.txt
+          python3 -m pip install numpy
+          python3 -m pip install scipy
+
+      - name: Show CMake version
+        run: cmake --version
+
+      - name: CMake Configure
+        run: >
+          cmake -S . -B .
+          -DPYBIND11_WERROR=ON
+          -DPYBIND11_SIMPLE_GIL_MANAGEMENT=OFF
+          -DDOWNLOAD_CATCH=ON
+          -DDOWNLOAD_EIGEN=ON
+          -DCMAKE_CXX_COMPILER=clang++
+          -DCMAKE_CXX_STANDARD=17
+          -DPYTHON_EXECUTABLE=$(python3 -c "import sys; print(sys.executable)")
+
+      - name: Build
+        run: cmake --build . -j 2
+
+      - name: Python tests
+        run: cmake --build . --target pytest -j 2
+
+      - name: C++ tests
+        run: cmake --build . --target cpptest -j 2
+
+      - name: Interface test
+        run: cmake --build . --target test_cmake_build -j 2
+
+      - name: Clean directory
+        run: git clean -fdx
--- a/Show More
+++ b/Show More