This bumps the docker image used to build GPU applications for input to
GPUFS simulations from ROCm 5.4.2 to ROCm 6.0.2 and Ubuntu from 20.04 to
22.04. This matches the versions in gem5-resources#29 .
Several notes were added to the Dockerfile to describe where the RUN
commands come from. A README.md is also added to clarify that this is
not a disk image for GPUFS and is only used to build applications.
Change-Id: I9ada99e2ed1854cb7adb76f2a1fa662bab398f86
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
There was some inconsistency in the GitHub Workflow files on using
'ubuntu-latest' (which gets the latest Ubuntu version) or
'ubuntu-22.04'. To keep things consistent 'ubuntu-latest' is now used in
all cases. This also saves us updating workloads upon release of a new
Ubuntu version.
gem5.fast does not currently build if the GPU model is built. This fixes
the array-bounds warnings allowing gem5.fast to build again.
Change-Id: I463c2847c3ecfd2257a70418fa247090b0493f9b
v3.0.0 of pre-commit/action caused a deprecation warning in actions.
v3.0.1 was released to deal with this.
Change-Id: Ib5654e465565ad4356754ac097983aec4166b98f
We only test the latest LTS Ubuntu release with min-deps. With 24.04, we
no longer require the 22.04 min dependencies image.
Change-Id: I4b3d668c1f9d10c2b6071848e6daada6c763b5e7
This change ensures all our tests run on our most recent supported LTS
release of Ubuntu.
In the case of compiler tests we still test 22.04 all-dep but test 24.04
all-dep and min-dep (i.e., we drop 22.04 min-dep as it's somewhat
redundant).
Change-Id: I63666d1017594b496523a48e5112a8994f57885f
Speciftying a DevContainer in gem5 allows for users to quickly create an
environment in which they can develop, build, and run gem5. The
".devcontainer/devcontainer.json" file specifies the properties of the
container. In this commit they are as follows:
1. The Docker image ghcr.io/gem5/devcontainer. This is built from
"util/dockerfiles/devcontainer". This Dockerfile provides all
dependencies and a pre-built gem5 binary from the current main branch
(added to "/usr/local/bin"). In order to support this Docker container
on different platforms we use the Docker multi-platform feature. As
such, this must be built using `docker buildx bake devcontainer --push`
which reads the `docker-bake.hcl file for the specification of the
multi-platform image.
2. Visual Studio extensions. This is a list of Visual Studio Code
extensions useful when developing gem5. They are automatically added the
Visual Studio dev container.
3. Features. Features are enhancemets that can be added to a
DevContainer. Normally they are libraries and other commonly used tools
to be included in the Container. As we have our dependencies specified
in the Dockerfile here we select one to enable Docker inside the
container, one to enable the Github CLI, one to improve Linting, and
finally one to enable the vscode CLI.
4. The On Create Command : This command allows us to specify commands to
be run after the DevContainer is created. In this case we execute
".devcontainer/on-create.sh" which, right now, refreshes the git index
and installed the pre-commit checks.
Fix issue #1004. When enabling SMT with the O3 cpu, only the first
interrupts object was getting initialized properly. This patch
initializes all interrupts objects, one per SMT thread.
Change-Id: I300782b645bd8ea3ef2497278fb73125ab4bf495
This commit adds more detailed instruction types for RISC-V Vector.
Concretely, it substitutes VectorIntegerArith, VectorFloatArith,
VectorIntegerReduce and VectorFloatReduce with more specific types
related to the operation that each instruction (e.g., VectorIntegerAdd
or VectorIntegerMult).
Additionaly, fixes two RISC-V instruction types (VectorXXX) that were
used in ARM SVE, placing the proper SimdXXX ones.
Change-Id: I31774fa6a7cd249abfffec68d11d3d77f08ad70b
CC @adriaarmejach
Add a generic cache template to construct internal storage structures.
Also add some example use cases by converting the prefetcher tables to
use this new library.
AssociativeSet can reuse most of the generic cache library code with the
addition of a secure bit. This reduces duplicated code.
Change-Id: I008ef79b0dd5f95418a3fb79396aeb0a6c601784
The tagged entry can be derived from the generic cache entry and add the secure
flag that it needs. This reduces code duplication.
Change-Id: I7ff0bddc40604a8a789036a6300eabda40339a0f
The DCPT table is better built using the generic cache library since we do not
need the secure bit.
Change-Id: I8a4a8d3dab7fbc3bbc816107492978ac7f3f5934
The frequency table is better built using the generic cache library instead of the
AssociativeSet since the secure bit is not needed for this structure.
Change-Id: Ie3b6442235daec7b350c608ad1380bed58f5ccf4
Add a generic cache library modeled after AssociativeSet that can be used for
constructing internal caching structures.
Change-Id: I1767309ed01f52672b32810636a09142ff23242f
Now that we are able to provide a view of the cache hierarchy from
the python world, we can start generating DTB entries for caches
and more specifically to properly fill the next-level-cache and
cache-level properties
Change-Id: Iba9ea08fe605f77a353c9e64d62b04b80478b4e2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
One of things we miss in gem5 is the capability to neatly compose the
cache hierarchy of CPUs and clusters of CPUs. The BaseCPU
addPrivateSplitL1Caches and addTwoLevelCacheHierarchy APIs have
historically been used to bind cache levels together.
These APIs have been superseeded by the introduction of the Cache
hierarchy abstraction in the standard library. The standard library
makes it cleaner for a user to quickly instantiate a hierarchy of caches
with few lines of code. While this removes a lot of complexity for a
user, the Hierarchy objects still have little information about their
internal topology.
To address this problem, this patch adds a tree data structure to the
AbstractCacheHierarchy class, where every node of the tree represent
a cache in the hierarchy. In this way we will expose APIs for traversing
and querying the tree.
For example a 2 CPUs system with private L1, private L2 and shared L3
will contain the following tree:
[root]
|
[L3]
/\
/ \
[L2] [L2]
| |
[L1] [L1]
Change-Id: I78fe6ad094f0938ff9bed191fb10b9e841418692
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf
The GPUDynInst for sending memory requests through the CUs data port
is required but only used for DPRINTFs. Relax this constraint so that
the methods can be reused for requests such as probes generated by the
GPU device.
Change-Id: I16094e400968225596370b684d6471580888d98a
One of things we miss in gem5 is the capability to neatly compose the
cache hierarchy of CPUs and clusters of CPUs. The BaseCPU
addPrivateSplitL1Caches and addTwoLevelCacheHierarchy APIs have
historically been used to bind cache levels together.
These APIs have been superseded by the introduction of the Cache
hierarchy abstraction in the standard library. The standard library
makes it cleaner for a user to quickly instantiate a hierarchy of caches
with few lines of code. While this removes a lot of complexity for a
user, the Hierarchy objects still have little information about their
internal topology.
To address this problem, this patch adds a tree data structure to the
AbstractCacheHierarchy class, where every node of the tree represent
a cache in the hierarchy. In this way we will expose APIs for traversing
and querying the tree.
For example a 2 CPUs system with private L1, private L2 and shared L3
will contain the following tree:
[root]
|
[L3]
/\
/ \
[L2] [L2]
| |
[L1] [L1]
This PR is offloading some of the partitioning logic to the partitioning
manager, effectively changing
the partitioning interface. Rather than always relying on the
PartitionFieldExtention data structure to
convey partition IDs, we make it implementation defined by introducing
the partitioning manager abstraction.
We want user to be able to extract the partitionId more flexibly and
this requires using a SimObject.
Users can extend the PartitioningManager, overriding the
readPacketPartitionId, therefore providing their
own mean of injecting/extracting partitioning data from a packet
M5Ops C / C++ functions partially use 64 bit arguments and return value.
In general, 64 bit arguments and return values are possible for 32 bit
RISC-V systems as well, since the arguments and the return value is
split into two registers. However, at the moment, this does not work for
32 bit RISC-V systems on the simulator side, since there is a one to one
mapping between argument registers and m5op function parameters.
To solve this problem, the get() function of the RISC-V reg_abi is
updated. It now will merge two registers if there is a 64 bit argument.
For this, the function code has to be passed to the get() function. The
default value of this function code is set to 0xF00, since 0x00 is
already used for M5_ARM. The parameter list of other get() functions for
argument return is also extended by this function code parameter with
the keyword [[maybe_unused]].
To enable a return value of size 64 bit, a0 is assigned with the lower
32 bit and a1 with the higher 32 bit.
Related Issue: https://github.com/gem5/gem5/issues/881
The new ISA-agnostic interface is the PartitionManager.
We therefore make the PartitionFieldExtention private to the
Arm implementation of memory partitioning (FEAT_MPAM)
Any other partitioning implementation should override the
PartitionManager::readPacketPartitionID to provide a mean
for extracting partitioning data (partition_id) from the
incoming Packet.
With this commit we also define an MPAM MSC which is
supposed to be the partitioning manager for the
Memory System Component
Change-Id: I6959ace0c0cbca549dcc1aacd53dff223b5fe328
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This SimObject can be used to quickly test the statistics are
functioning correctly. The SimObject schedules a single event which sets
the statistics to values dependent on the SimObject params.
With this commit the "Scalar" stats have a StatTester subclass that can
be used for testing. More can be added as required.
Tests are included to check our Scalar SimStat functionality.
This has the SimObjects defined in "src/test_objects" only be compiled
into the gem5 binary if the Kconfig 'USE_TEST_OBJECTS" == 'y'. This
happens in two cases:
1. When 'ALL/gem5' is compiled via "build_opts".
2. When tests are run via "./tests/main.py".
Change-Id: I2330008fd7c7900de5f4de142b8ac89ef4e351ce
This SimObject can be used to quickly test the statistics are
functioning correctly. The SimObject schedules a single event which sets
the statistics to values dependent on the SimObject params.
With this commit the "Scalar" stats have a StatTester subclass that can
be used for testing. More can be added as required.
Tests are included to check our Scalar SimStat functionality.
Change-Id: I78fa5d9a0c3fc7115bd6c6d3410a5436aaa47f55