This is not needed with upload-artifact v4 directories are archived and
compressed by default.
This zip step was also causing Daily/Weekly test failures due to not
running `apt update` before the `apt install` for the zip utility. Ergo
this patch fixes these errors.
A user reported a bug with the SSE4.1 version of memcmp in libc. When
enabled the simulated program crashes with SIGILL. After attempting all
fixes recommended by Intel SDM and still not working, turning the bit
off instead.
Similar, the default XSAVE functionality is not completely implemented
for AVX and newer ISA extensions. Therefore, there is not much point to
claiming to support the more advanced versions of XSAVE (XSAVEOPT,
XSAVEC, XSAVES, and XGETBV with ECX=1).
Note that none of these bits are enabled for non-GPU full system
simulations (see src/arch/x86/X86ISA.py). This only impacts GPUFS
simulations.
Change-Id: I8eb7bf0f2a0a29226095e7889fec9c1e8a65f88f
Due to an oversight, the PyUnit tests were not being run as part of the
gem5 CI tests. This was because they are located in "tests/pyunit"
instead of "tests/gem5", where the CI GitHub Action workflow searched
for tests to run and where all other tests reside.
This adds the Pyunit tests as a seperate job in the CI GitHub Action's
workflow.
Due to an oversight, the PyUnit tests were not being run as part of the
gem5 CI tests. This was because they are located in "tests/pyunit"
instead of "tests/gem5", where the CI GitHub Action workflow searched
for tests to run and where all other tests reside.
This adds the Pyunit tests as a seperate job in the CI GitHub Action's
workflow.
Change-Id: I63d93571fde11c19bf3d281c034eddf4b455ae4e
This PR introduces a missing pice of far atomic implementation. This
pull request incorporates several changes:
- Enable 2-level and 4-level (and N-level) cache hierarchies, removing
Atomic_NoWait transactions
- Fix Unique Near policy implementation that raised abort
- Add support for alloc_on_atomic == False. Enables Far Atomics on
systems where the HNF does not allocate evicted lines at LLC (Like in
WriteUpdate).
Movfp instruction did not account for only copying the lower half of src
register if dataSize is 4.
GitHub Issue: #893
I used the test code in issue #893 to verify the fix is working.
Hi, we've noticed some issues with the Uart8250 device when using it as
the Linux console. Sometimes the Uart interrupt would remain constantly
posted, so Linux would continue to try and handle it, effectively
resulting in an infinite loop. With this patch, I'm no longer seeing any
issues, but my testing has been limited to configurations and workloads
we're interested in at Imagination, so please let me know if there's
some other tests I should run or if you notice any other issues.
This patch fixes several issues with interrupt posting and clearing in
the uart8250 device.
The "status" member variable and the console interrupt should be kept in
sync. However, in one code path in readIir, the interrupt bit was being
cleared in the status variable but not in the platform controller.
Additionally, in some code paths, the interrupts would be cleared in the
status variable and in the interrupt controller, but a future interrupt
would remain scheduled, causing a spurious interrupt and setting a bit
in status to 1.
These issues can confuse the kernel and result in an ininite interrupt
handling loop.
Another issue is related to the fact that there are two interrupt causes
(TX and RX) and both of them can be valid at the same time. When one of
them becomes no longer valid, we should check the status of the other
one before clearing the interrupt.
This patch addresses the issues listed above and refactors the interrupt
clearing logic to reduce repetition.
Fixes#1033
In the BaseCPU object _uncached_interrupt_response_ports is a class
variable, not an instance variable. #1004 changed the explicit
self._uncached_interrupt_response_ports to use extend. This caused the
list of ports to be extended *for all cores*, which caused problems when
using a system with more than 1 core.
This reverts the `extend` part of the change, but keeps the rest.
Change-Id: I6dc7d6da6763048d82960229d34933a3a2ac36e0
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This bumps the docker image used to build GPU applications for input to
GPUFS simulations from ROCm 5.4.2 to ROCm 6.0.2 and Ubuntu from 20.04 to
22.04. This matches the versions in gem5-resources#29 .
Several notes were added to the Dockerfile to describe where the RUN
commands come from. A README.md is also added to clarify that this is
not a disk image for GPUFS and is only used to build applications.
Change-Id: I9ada99e2ed1854cb7adb76f2a1fa662bab398f86
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
There was some inconsistency in the GitHub Workflow files on using
'ubuntu-latest' (which gets the latest Ubuntu version) or
'ubuntu-22.04'. To keep things consistent 'ubuntu-latest' is now used in
all cases. This also saves us updating workloads upon release of a new
Ubuntu version.
gem5.fast does not currently build if the GPU model is built. This fixes
the array-bounds warnings allowing gem5.fast to build again.
Change-Id: I463c2847c3ecfd2257a70418fa247090b0493f9b
v3.0.0 of pre-commit/action caused a deprecation warning in actions.
v3.0.1 was released to deal with this.
Change-Id: Ib5654e465565ad4356754ac097983aec4166b98f
We only test the latest LTS Ubuntu release with min-deps. With 24.04, we
no longer require the 22.04 min dependencies image.
Change-Id: I4b3d668c1f9d10c2b6071848e6daada6c763b5e7
This change ensures all our tests run on our most recent supported LTS
release of Ubuntu.
In the case of compiler tests we still test 22.04 all-dep but test 24.04
all-dep and min-dep (i.e., we drop 22.04 min-dep as it's somewhat
redundant).
Change-Id: I63666d1017594b496523a48e5112a8994f57885f
Speciftying a DevContainer in gem5 allows for users to quickly create an
environment in which they can develop, build, and run gem5. The
".devcontainer/devcontainer.json" file specifies the properties of the
container. In this commit they are as follows:
1. The Docker image ghcr.io/gem5/devcontainer. This is built from
"util/dockerfiles/devcontainer". This Dockerfile provides all
dependencies and a pre-built gem5 binary from the current main branch
(added to "/usr/local/bin"). In order to support this Docker container
on different platforms we use the Docker multi-platform feature. As
such, this must be built using `docker buildx bake devcontainer --push`
which reads the `docker-bake.hcl file for the specification of the
multi-platform image.
2. Visual Studio extensions. This is a list of Visual Studio Code
extensions useful when developing gem5. They are automatically added the
Visual Studio dev container.
3. Features. Features are enhancemets that can be added to a
DevContainer. Normally they are libraries and other commonly used tools
to be included in the Container. As we have our dependencies specified
in the Dockerfile here we select one to enable Docker inside the
container, one to enable the Github CLI, one to improve Linting, and
finally one to enable the vscode CLI.
4. The On Create Command : This command allows us to specify commands to
be run after the DevContainer is created. In this case we execute
".devcontainer/on-create.sh" which, right now, refreshes the git index
and installed the pre-commit checks.
Fix issue #1004. When enabling SMT with the O3 cpu, only the first
interrupts object was getting initialized properly. This patch
initializes all interrupts objects, one per SMT thread.
Change-Id: I300782b645bd8ea3ef2497278fb73125ab4bf495
This commit adds more detailed instruction types for RISC-V Vector.
Concretely, it substitutes VectorIntegerArith, VectorFloatArith,
VectorIntegerReduce and VectorFloatReduce with more specific types
related to the operation that each instruction (e.g., VectorIntegerAdd
or VectorIntegerMult).
Additionaly, fixes two RISC-V instruction types (VectorXXX) that were
used in ARM SVE, placing the proper SimdXXX ones.
Change-Id: I31774fa6a7cd249abfffec68d11d3d77f08ad70b
CC @adriaarmejach
Add a generic cache template to construct internal storage structures.
Also add some example use cases by converting the prefetcher tables to
use this new library.
AssociativeSet can reuse most of the generic cache library code with the
addition of a secure bit. This reduces duplicated code.
Change-Id: I008ef79b0dd5f95418a3fb79396aeb0a6c601784
The tagged entry can be derived from the generic cache entry and add the secure
flag that it needs. This reduces code duplication.
Change-Id: I7ff0bddc40604a8a789036a6300eabda40339a0f
The DCPT table is better built using the generic cache library since we do not
need the secure bit.
Change-Id: I8a4a8d3dab7fbc3bbc816107492978ac7f3f5934
The frequency table is better built using the generic cache library instead of the
AssociativeSet since the secure bit is not needed for this structure.
Change-Id: Ie3b6442235daec7b350c608ad1380bed58f5ccf4
Add a generic cache library modeled after AssociativeSet that can be used for
constructing internal caching structures.
Change-Id: I1767309ed01f52672b32810636a09142ff23242f
Now that we are able to provide a view of the cache hierarchy from
the python world, we can start generating DTB entries for caches
and more specifically to properly fill the next-level-cache and
cache-level properties
Change-Id: Iba9ea08fe605f77a353c9e64d62b04b80478b4e2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
One of things we miss in gem5 is the capability to neatly compose the
cache hierarchy of CPUs and clusters of CPUs. The BaseCPU
addPrivateSplitL1Caches and addTwoLevelCacheHierarchy APIs have
historically been used to bind cache levels together.
These APIs have been superseeded by the introduction of the Cache
hierarchy abstraction in the standard library. The standard library
makes it cleaner for a user to quickly instantiate a hierarchy of caches
with few lines of code. While this removes a lot of complexity for a
user, the Hierarchy objects still have little information about their
internal topology.
To address this problem, this patch adds a tree data structure to the
AbstractCacheHierarchy class, where every node of the tree represent
a cache in the hierarchy. In this way we will expose APIs for traversing
and querying the tree.
For example a 2 CPUs system with private L1, private L2 and shared L3
will contain the following tree:
[root]
|
[L3]
/\
/ \
[L2] [L2]
| |
[L1] [L1]
Change-Id: I78fe6ad094f0938ff9bed191fb10b9e841418692
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf
The GPUDynInst for sending memory requests through the CUs data port
is required but only used for DPRINTFs. Relax this constraint so that
the methods can be reused for requests such as probes generated by the
GPU device.
Change-Id: I16094e400968225596370b684d6471580888d98a