Commit Graph

20933 Commits

Author SHA1 Message Date
Roger Chang
3b06925408 scons: Update Kconfig description
Change-Id: I69206fb9881bc0d53660bbd1cf8fc225ead9fea3
2023-11-23 08:26:11 +08:00
Roger Chang
d758df4b5c scons: Update the Kconfig build options
The CL updates the Kconfig:
1. Replace the USE_NULL_ISA with BUILD_ISA
2. The USE_XXX_ISAs are depends on BUILD_ISA
3. If the BUILD_ISA is set, at least one of USE_XXX_ISAs must be set
4. Refactor the USE_KVM option

Change-Id: I2a600dea9fb671263b0191c46c5790ebbe91a7b8
2023-11-23 08:26:11 +08:00
Gabe Black
d37673be9f scons: Remove the default-default build target.
In gem5, there are many equally valid and equally useful top level
targets which the user might want. It no longer makes sense to
arbitrarily pick one to be the default target. It makes sense to force
the user to actually specify what they want, instead of assuming it
must be the ARM debug binary.

There is currently an M5_DEFAULT_BINARY environment variable which
will change what the default binary is, if set. This change leaves
that in place, but removes the default-default, or in other words the
default that is used if M5_DEFAULT_BINARY is not set.

This way if the user knows what default they want, they can specify it
locally in their environment and avoid having to type it over and over
again, but we're not making an arbitrary choice at a more global level
without the context to know what actually makes sense.

Change-Id: I886adb1289b9879d53387250f950909a4809ed8b
2023-11-23 08:26:11 +08:00
Gabe Black
63919f6a68 scons: Hook up oldconfig and olddefconfig.
These two utilities help update an old config to add settings for new
config options. The difference between them is that oldconfig asks what
new settings you want to use, while olddefconfig automatically picks the
defaults.

Change-Id: Icd3e57f834684e620705beb884faa5b6e2cc7baa
2023-11-23 08:26:11 +08:00
Gabe Black
ec76214f68 scons: Hook up the savedefconfig kconfig helper.
This helper utility lets you save the defconfig which would give rise to
a given config. For instance, you could use menuconfig to set up a
config how you want it with the options you cared about configured, and
then use savedefconfig to save a defconfig of that somewhere to the
side, in the gem5 defconfig directory, etc. Then later, you could use
that defconfig to set up a new build directory with that same config,
even if the kconfig options have changed a little bit since then.

A saved defconfig like that can also be a good way to visually see what
options have been set to something interesting, and an easier way to
pass a config to someone else to use, to put in bug reports, etc.

Change-Id: Ifd344278638c59b48c261b36058832034c009c78
2023-11-23 08:26:11 +08:00
Gabe Black
51b8cfcede scons: Hook up the kconfig guiconfig program.
Change-Id: I0563a2fb2d79cea5974aeaf65a400be5ee51dc63
2023-11-23 08:26:11 +08:00
Gabe Black
91b3da016b scons: Hook in the listnewconfig kconfig helper.
This helper lists config options which are new in the Kconfig and which
are not currently set in the config file.

Change-Id: I0c426d85c0cf0d2bdbac599845669165285a82a0
2023-11-23 08:26:11 +08:00
Gabe Black
083bca1e23 scons: Hook in the kconfig setconfig utility.
This little utility lets you set particular values in an existing config
without having to open up the whole menuconfig interface.

Also reorganize things in kconfig.py a little to help share code between
wrappers.

Change-Id: I7cba0c0ef8d318d6c39e49c779ebb2bbdc3d94c8
2023-11-23 08:26:11 +08:00
Gabe Black
1ae2dfcc56 scons: Add a mechanism to manually defconfig a build dir.
This will let you specify *any* defconfig file, instead of implicitly
selecting one from the defconfig directory based on the variant name.

Change-Id: I74c981b206849f08e60c2df702c06534c670cc7c
2023-11-23 08:26:11 +08:00
Gabe Black
1e84d9f941 scons: Add a mechanism to run menuconfig to set up a build dir.
If you call scons with the fist argument set to menuconfig, that means
to run menuconfig on the path following it. Or in other words, if
you ran this command:

scons menuconfig build/foo/bar

That would tell SCons to set up a build directory at the path
build/foo/bar, and then invoke menuconfig so you can set up its
configuration.

In addition to using this mechanism to set up a new build directory, you
can also use it to reconfigure an existing directory.

This supplements and does not replace the existing mechanism of using
"build/${VARIANT}" to select a config with defconfig.

Change-Id: Ief8e8c2ee6477799455c2004bef06c64be5cc1db
2023-11-23 08:26:11 +08:00
Gabe Black
f4c578f458 scons: Flesh out the help text for "magic" targets.
These targets are not necessarily obvious, and tell SCons to do useful
things, like build a particular version of the gem5 binary with a
particular configuration, or run the unit tests.

Add descriptions of these targets to the help so that they are much
more discoverable.

Change-Id: If84399be1a7155ff5f66f511efe1f1c241089c84
2023-11-23 08:26:10 +08:00
Gabe Black
1cdccd7ac0 scons: Add a build script for generating a root Kconfig file.
This root Kconfig file "source"s (includes) the base gem5 src/Kconfig
file, and also any optional Kconfig files found in the base of EXTRAS
directories. These will be called out in the menuconfig interface and
config files with the name of the EXTRAS directory they came from, and a
blank section will be present either if the Kconfig didn't exist, or it
did exist but had no options in it.

Change-Id: I54060d613f0e0ab9372bed37a2fe5849bf5bbcdb
2023-11-23 08:26:10 +08:00
Gabe Black
db3a6e8e84 scons: Use Kconfig to configure gem5.
These are not yet consumed by anything, but convert all the settings
from SCons variables to Kconfig variables.

If you have existing SConsopts files which need to be converted, you
should take a look at KCONFIG.md to learn about how kconfig is used in
gem5. You should decide if any variables need to be available to C++ or
kconfig itself, and whether those are options which should be detected
automatically, or should be up to the user. Options which should be
measured automatically should still be in SConsopts files, while user
facing options should be added to new or existing Kconfig files.

Generally, make sure you're storing c++/kconfig visible options in
env['CONF'][...]. Also remove references to sticky_vars since persistent
options should now be handled with kconfig, and export_vars since
everything in env['CONF'] is now exported automatically.

Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which
is still a sticky SCons variable. This is necessary because EXTRAS also
controls what config options exist. If it came from Kconfig itself, then
there would be a circular dependency. This dependency could
theoretically be handled by reparsing the Kconfig when EXTRAS
directories were added or removed, but that would be complicated, and
isn't supported by kconfiglib. It wouldn't be worth the significant
effort it would take to add it, just to use Kconfig more purely.

Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9
2023-11-23 08:26:10 +08:00
Gabe Black
5f73a9bbf0 scons: Use either the "build" or "gem5.build" as build anchor.
If gem5.build already exists within a directory, then that build
directory can be used without having to worry about variants.

If it doesn't exist and we find a build/${VARIANT} style path, then we
use that as the anchor.

In either case, the variant name is the final component of the build
path. The parse_build_path function had been separating that out, but it
was just put back onto the path again anyway by the only caller, and
then split out again when that path was consumed. We save a step by not
splitting it out in parse_build_path.

Change-Id: I8705b3dbb7664748f5525869cb188df70319d403
2023-11-23 08:26:10 +08:00
Matthew Poremba
6e433ed885 mem-ruby: Fixes for new AtomicWait event in VIPER TCC (#585)
The AtomicWait event was not being woken up properly due to the
numPending count in the TBE not being decremented. This patch decrements
the count when Data is returned. Since that moves to a base state, the
TBE should no longer be needed.

Additionally added a transition which stalls and wait when an AtomicWait
occurs while in WI state so that it retries.

Change-Id: Ic8bfc700f9df3f95bea0799121898926a23d8163
2023-11-22 14:05:43 -08:00
Bobby R. Bruce
23a22ed95c dev-amdgpu: Add VMID map to checkpoint (#570)
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration
2023-11-22 10:05:21 -08:00
Hoa Nguyen
3009e0fb57 mem-ruby: Fix typo in CHI's Send_CompI (#579)
The destination for the response is set twice.
2023-11-20 21:38:13 -08:00
Bobby R. Bruce
d772f3967b dev: Fix std::min type mismatch in reg_bank.hh (#582)
https://github.com/gem5/gem5/pull/386 included two cases in
"src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer
of type `size_t` and another of type `Addr`. This causes an error on my
Apple Silicon Mac as the comparison between an "unsigned long" and an
"unsigned long long" is not permitted. To fix this issue this patch
changes `reg_size` from `size_t` to `Addr`, as well as it the types of
the values it was derived from and the variable used to hold the return
from the `std::min` calls. While not completely correct typing from a
labelling perspective (`reg_bytes` is not an address), functions in
"src/dev/reg_bank.hh" already abuse `Addr` in this way frequently (for
example, `bytes` in the `write` function).
2023-11-20 21:37:45 -08:00
Vishnu Ramadas
06161ded8c dev-amdgpu: Add VMID map to checkpoint
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit checkpoints the existing VMID map so that any
new doorbells after restoration use a unique queue ID

Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
2023-11-20 21:19:17 -06:00
Bobby R. Bruce
08c0d1f27a dev: Fix std::min type mismatch in reg_bank.hh
https://github.com/gem5/gem5/pull/386 included two cases in
"src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer
of type `size_t` and another of type `Addr`. This cause an error on my
Apple Silicon Mac as this is a comparison between an "unsigned long"
and an "unsigned long long" which (at least on my setup) was not
permitted. To fix this issue the `reg_size` was changed from `size_t` to
`Addr`, as well as it the types of the values it was derived from and
the variable used to hold the return from the `std::min` calls.

Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746
2023-11-20 17:33:44 -08:00
Matthew Poremba
3896673ddc util: Bump GPUFS build docker to 5.4.2 (#571)
This dockerfile is used to *build* applications (e.g., from
gem5-resources) which can be run using full system mode in a GPU build.
The next releases disk image will use ROCm 5.4.2, therefore bump the
version from 4.2 to that version.

Again this is used to *build* input applications only and is not needed
to run or compile gem5 with GPUFS. For example:

$ docker build -t rocm54-build .
/some/gem5-resources/src/gpu/lulesh$ docker run --rm -u $UID:$GID -v \
    ${PWD}:${PWD} -w ${PWD} rocm54-build make

Change-Id: If169c8d433afb3044f9b88e883ff3bb2f4bc70d2
2023-11-18 18:13:06 -08:00
Vishnu Ramadas
d19d6fc31e dev-amdgpu: Add PM4 queue ID to GPU used VMID map
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration

Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
2023-11-16 17:30:00 -06:00
Jason Lowe-Power
db6a869786 mem-cache: Prefetchers Improvements (#564)
This pull request contains a set of small patches which fix some bugs in
the gem5 prefetchers, and aligns out-of-the box prefetcher performance
more closely with that which a typical user would expect.

The performance patches have been tested with an out-of-the-box
(untuned) Stride prefetcher configuration against a set of SPEC 2017
SimPoints, and show a modest IPC uplift across the board, with no IPC
degradation.

The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
2023-11-16 15:22:26 -08:00
Giacomo Travaglini
4ca2efac16 mem-ruby: AtomicNoReturn should check comp_anr instead of comp_wu (#545)
The comp_anr parameter is currently unused. Both parameters (comp_wu and
comp_anr) are set to false by default

Change-Id: If09567504540dbee082191d46fcd53f1363d819f

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-11-16 15:20:51 -08:00
Matthew Poremba
4965367724 mem-ruby, gpu-compute: fix SQC/TCP requests to same line (#540)
Currently, the GPU SQC (L1I$) and TCP (L1D$) have a performance bug
where they do not behave correctly when multiple requests to the same
cache line overlap one another.  The intended behavior is that if the
first request that arrives at the Ruby code for the SQC/TCP misses, it
should send a request to the GPU TCC (L2$).  If any requests to the
same cache line occur while this first request is pending, they should
wait locally at the L1 in the MSHRs (TBEs) until the first request has
returned.  At that point they can be serviced, and assuming the line
has not been evicted, they should hit.

For example, in the following test (on 1 GPU thread, in 1 WG):

load Arr[0]
load Arr[1]
load Arr[2]

The expected behavior (confirmed via profiling on real GPUs) is that
we should get 1 miss (Arr[0]) and 2 hits (Arr[1], Arr[2]) for such a
program.

However, the current support in the VIPER SQC/TCP code does not model
this correctly.  Instead it lets all 3 concurrent requests go straight
through to the TCC instead of stopping the Arr[1] and Arr[2] requests
locally while Arr[0] is serviced.  This causes all 3 requests to be
classified as misses.

To resolve this, this patch adds support into the SQC/TCP code to
prevent subsequent, concurrent requests to a pending cache line from
being
sent in parallel with the original one.  To do this, we add an
additional transient state (IV) to indicate that a load is pending to
this cache line.  If a subsequent request of any kind to the same cache
line occurs while this load is pending, the requests are put on the
local wait buffer and woken up when the first request returns to the
SQC/TCP.  Likewise, when the first load is returned to the SQC/TCP, it
transitions from IV --> V.

As part of this support, additional transitions were also added to
account for corner cases such as what happens when the line is evicted
by another request that maps to the same set index while the first load
is pending (the line is immediately given to the new request, and when
the load returns it completes, wakes up any pending requests to the same
line, but does not attempt to change the state of the line) and how GPU
bypassing loads and stores should interact with the pending requests
(they are forced to wait if they reach the L1 after the pending,
non-bypassing load; but if they reach the L1 before the non-bypassing
load then they make sure not to change the state of the line from IV if
they return before the non-bypassing load).

As part of this change, we also move the MSHR behavior from internally
in the GPUCoalescer for loads to the Ruby code (like all other
requests).  This is important to get correct hits and misses in stats
and other prints, since the GPUCoalescer MSHR behavior assumes all
requests serviced out of its MSHR also miss if the original request to
that line missed.

Although the SQC does not support stores, the TCP does.  Thus,
we could have applied a similar change to the GPU stores at the TCP.
However, since the TCP support assumes write-through caches and does not
attempt to allocate space in the TCP, we elected not to add this support
since it seems to run contrary to the intended behavior (i.e., the
intended behavior seems to be that writes just bypass the TCP and thus
should not need to wait for another write to the same cache line to
complete).

Additionally, making these changes introduced issues with deadlocks at
the TCC.  Specifically, some Pannotia applications have accesses to the
same cache line where some of the accesses are GLC (i.e., they bypass
the GPU L1 cache) and others are non-GLC (i.e., they want to be cached
in the GPU L1 cache). We have support already per CU in the above code.
However, the problem here is that these requests are coming from
different CUs and happening concurrently (seemingly because different
WGs are at different points in the kernel around the same time).
This causes a problem because our support at the TCC for the TBEs
overwrites the information about the GPU bypassing bits (SLC, GLC) every
time. The problem is when the second (non-GLC) load reaches the TCC, it
overwrites the SLC/GLC information for the first (GLC) load. Thus, when
the the first load returns from the directory/memory, it no longer has
the GLC bit set, which causes an assert failure at the TCP.

After talking with other developers, it was decided the best way handle
this and attempt to model real hardware more closely was to move the
point at which requests are put to sleep on the wakeup buffer from the
TCC to the directory. Accordingly, this patch includes support for that
-- now when multiple loads (bypassing or non-bypassing) from different
CUs reach the directory, all but the first one will be forced to wait
there until the first one completes, then will be woken up and
performed.  This required updating the WTRequestor information at the
TCC to pass the information about what CU performed the original request
for loads as well (otherwise since the TBE can be updated by multiple
pending loads, we can't tell where to send the final result to).  Thus,
I changed the field to be named CURequestor instead of WTRequestor since
it is now used for more than stores.  Moreover, I also updated the
directory to take this new field and the GLC information from incoming
TCC requests and then pass that information back to the TCC on the
response -- without doing this, because the TBE can be updated by
multiple pending, concurrent requests we cannot determine if this memory
request was a bypassing or non-bypassing request.  Finally, these
changes introduced a lot of additional contention and protocol stalls at
the directory, so this patch converted all directory uses of z_stall to
instead put requests on the wakeup buffer (and wake them up when the
current request completes) instead. Without this, protocol stalls cause
many applications to deadlock at the directory.

However, this exposed another issue at the TCC: other applications
(e.g., HACC) have a mix of atomics and non-atomics to the same cache
line in the same kernel.  Since the TCC transitions to the A state when
an atomic arrives. For example, after the first pending load returns to
the TCC from the directory, which causes the TCC state to become V, but
when there are still other pending loads at the TCC. This causes invalid
transition errors at the TCC when those pending loads return, because
the A state thinks they are atomics and decrements the pending atomic
count (plus the loads are never sent to the TCP as returning loads).
This patch fixes this by changing the TCC TBEs to model the number of
pending requests, and not allowing atomics to be issued from the TCC
until all prior, pending non-atomic requests have returned.

Change-Id: I37f8bda9f8277f2355bca5ef3610f6b63ce93563
2023-11-16 14:24:00 -08:00
Bobby R. Bruce
bfe899e48e stdlib, resources: Update JSON data in workload (#532)
- resources field in workload now supports a dict with resources id and
version.

- Older workload JSON are still supported but added a deprecation waring
2023-11-16 10:11:13 -08:00
Giacomo Travaglini
047a494c2b mem-cache: Optimize strided prefetcher address generation
This commit optimizes the address generation logic in the strided
prefetcher by introducing the following changes

(d is the degree of the prefetcher)

* Evaluate the fixed prefetch_stride only once (and not d-times)
* Replace 2d multiplications (d * prefetch_stride and distance *
prefetch_stride) with additions by updating the new base prefetch
address while looping

Change-Id: I49c52333fc4c7071ac3d73443f2ae07bfcd5b8e4
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Tiberiu Bucur <tiberiu.bucur@arm.com>
2023-11-16 09:48:15 +00:00
Nikolaos Kyparissas
2abd65c270 mem: added distance parameter to stride prefetcher
The Stride Prefetcher will skip this number of strides ahead of the
first identified prefetch, then generate `degree` prefetches at
`stride` intervals. A value of zero indicates no skip (i.e. start
prefetching from the next identified prefetch address).

This parameter can be used to increase the timeliness of prefetches by
starting to prefetch far enough ahead of the demand stream to cover
the memory system latency.

[Richard Cooper <richard.cooper@arm.com>:
- Added detail to commit comment and `distance` Param documentation.
- Changed `distance` Param from `Param.Int` to `Param.Unsigned`.
]

Change-Id: I6c4e744079b53a7b804d8eab93b0f07b566f0c08
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Signed-off-by: Richard Cooper <richard.cooper@arm.com>
2023-11-16 09:48:09 +00:00
Yu-Cheng Chang
ceabe86b31 arch-riscv: Add overrides to RISC-V Interrupts class (#568) 2023-11-15 18:36:15 -08:00
Matt Sinclair
c3326c78e6 mem-ruby, gpu-compute: fix SQC/TCP requests to same line
Currently, the GPU SQC (L1I$) and TCP (L1D$) have a performance bug
where they do not behave correctly when multiple requests to the same
cache line overlap one another.  The intended behavior is that if the
first request that arrives at the Ruby code for the SQC/TCP misses, it
should send a request to the GPU TCC (L2$).  If any requests to the
same cache line occur while this first request is pending, they should
wait locally at the L1 in the MSHRs (TBEs) until the first request has
returned.  At that point they can be serviced, and assuming the line
has not been evicted, they should hit.

For example, in the following test (on 1 GPU thread, in 1 WG):

load Arr[0]
load Arr[1]
load Arr[2]

The expected behavior (confirmed via profiling on real GPUs) is that
we should get 1 miss (Arr[0]) and 2 hits (Arr[1], Arr[2]) for such a
program.

However, the current support in the VIPER SQC/TCP code does not model
this correctly.  Instead it lets all 3 concurrent requests go straight
through to the TCC instead of stopping the Arr[1] and Arr[2] requests
locally while Arr[0] is serviced.  This causes all 3 requests to be
classified as misses.

To resolve this, this patch adds support into the SQC/TCP code to
prevent subsequent, concurrent requests to a pending cache line from being
sent in parallel with the original one.  To do this, we add an
additional transient state (IV) to indicate that a load is pending to
this cache line.  If a subsequent request of any kind to the same cache
line occurs while this load is pending, the requests are put on the
local wait buffer and woken up when the first request returns to the
SQC/TCP.  Likewise, when the first load is returned to the SQC/TCP, it
transitions from IV --> V.

As part of this support, additional transitions were also added to
account for corner cases such as what happens when the line is evicted
by another request that maps to the same set index while the first load
is pending (the line is immediately given to the new request, and when
the load returns it completes, wakes up any pending requests to the same
line, but does not attempt to change the state of the line) and how GPU
bypassing loads and stores should interact with the pending requests
(they are forced to wait if they reach the L1 after the pending,
non-bypassing load; but if they reach the L1 before the non-bypassing
load then they make sure not to change the state of the line from IV if
they return before the non-bypassing load).

As part of this change, we also move the MSHR behavior from internally
in the GPUCoalescer for loads to the Ruby code (like all other
requests).  This is important to get correct hits and misses in stats
and other prints, since the GPUCoalescer MSHR behavior assumes all
requests serviced out of its MSHR also miss if the original request to
that line missed.

Although the SQC does not support stores, the TCP does.  Thus,
we could have applied a similar change to the GPU stores at the TCP.
However, since the TCP support assumes write-through caches and does not
attempt to allocate space in the TCP, we elected not to add this support
since it seems to run contrary to the intended behavior (i.e., the
intended behavior seems to be that writes just bypass the TCP and thus
should not need to wait for another write to the same cache line to
complete).

Additionally, making these changes introduced issues with deadlocks at
the TCC.  Specifically, some Pannotia applications have accesses to the
same cache line where some of the accesses are GLC (i.e., they bypass
the GPU L1 cache) and others are non-GLC (i.e., they want to be cached
in the GPU L1 cache). We have support already per CU in the above code.
However, the problem here is that these requests are coming from
different CUs and happening concurrently (seemingly because different
WGs are at different points in the kernel around the same time).
This causes a problem because our support at the TCC for the TBEs
overwrites the information about the GPU bypassing bits (SLC, GLC) every
time. The problem is when the second (non-GLC) load reaches the TCC, it
overwrites the SLC/GLC information for the first (GLC) load. Thus, when
the the first load returns from the directory/memory, it no longer has
the GLC bit set, which causes an assert failure at the TCP.

After talking with other developers, it was decided the best way handle
this and attempt to model real hardware more closely was to move the
point at which requests are put to sleep on the wakeup buffer from the
TCC to the directory. Accordingly, this patch includes support for that
-- now when multiple loads (bypassing or non-bypassing) from different
CUs reach the directory, all but the first one will be forced to wait
there until the first one completes, then will be woken up and
performed.  This required updating the WTRequestor information at the
TCC to pass the information about what CU performed the original request
for loads as well (otherwise since the TBE can be updated by multiple
pending loads, we can't tell where to send the final result to).  Thus,
I changed the field to be named CURequestor instead of WTRequestor since
it is now used for more than stores.  Moreover, I also updated the
directory to take this new field and the GLC information from incoming
TCC requests and then pass that information back to the TCC on the
response -- without doing this, because the TBE can be updated by
multiple pending, concurrent requests we cannot determine if this memory
request was a bypassing or non-bypassing request.  Finally, these
changes introduced a lot of additional contention and protocol stalls at
the directory, so this patch converted all directory uses of z_stall to
instead put requests on the wakeup buffer (and wake them up when the
current request completes) instead. Without this, protocol stalls cause
many applications to deadlock at the directory.

However, this exposed another issue at the TCC: other applications
(e.g., HACC) have a mix of atomics and non-atomics to the same cache
line in the same kernel.  Since the TCC transitions to the A state when
an atomic arrives. For example, after the first pending load returns to
the TCC from the directory, which causes the TCC state to become V, but
when there are still other pending loads at the TCC. This causes invalid
transition errors at the TCC when those pending loads return, because
the A state thinks they are atomics and decrements the pending atomic
count (plus the loads are never sent to the TCP as returning loads).
This patch fixes this by changing the TCC TBEs to model the number of
pending requests, and not allowing atomics to be issued from the TCC
until all prior, pending non-atomic requests have returned.

Change-Id: I37f8bda9f8277f2355bca5ef3610f6b63ce93563
2023-11-15 19:23:51 -06:00
Matt Sinclair
065ddf759f mem-ruby, gpu-compute: fix bug with GPU bypassing loads
The current GPU TCP (L1D$) Ruby SLICC code had a bug where a GPU
load that wants to bypass the L1D$ (e.g., GLC or SLC bit was set)
but the line is in Invalid when that request arrives, results in
a non-bypassing load being sent to the GPU TCC (L2$) instead of
a bypassing load.

This issue was not caught by currently nightly or weekly tests,
because the tests do not test for correctness in terms of hits
and misses in the caches.  However, tests for these corner cases
expose this issue.

To fix, this, this patch removes the check that the entry is valid
when deciding what to do with a bypassing GPU load -- since the
TCP Ruby code has transitions for bypassing loads in both I and V,
we can simply call the LoadBypassEvict event in both cases and the
appropriate transition will handle the bypassing load given the
cache line's current state in the TCP.

Change-Id: Ia224cefdf56b4318b2bcbd0bed995fc8d3b62a14
2023-11-15 19:23:51 -06:00
hungweihsuG
83f1fe3fec dev: add debug flag in register bank. (#386)
Print extra logs for the full/partial read/write access to the registers
through the register bank. The debug flag is empty by default and would
not print anything.

Test: run unittest of dev/reg_bank.test.xml to check the behavior would
not affect the original functionality.
run gem5 with debug flags and use m5term to poke on registers.
2023-11-15 10:04:46 -08:00
wmin0
a8440f367d arch-riscv: Move fault handler addr logic to ISA (#554)
mtvec.mode is extended in the new riscv proposal, like fast interrupt.
This change moves that part from Fault class to ISA class for
extendable.

Ref: https://github.com/riscv/riscv-fast-interrupt
2023-11-15 10:04:01 -08:00
BujSet
4a5ec70e08 gpu-compute: Minor edits for atomic no returns and stores (#565)
Since returned data is not needed for AtomicNoReturn and Store memory
requests, the coalescer need not spend time writing in dummy data for
packets of these types.

Change-Id: Ie669e8c2a3bf44b5b0c290f62c49c5d4876a9a6a
2023-11-15 07:20:07 -08:00
Bobby R. Bruce
30787b59d4 tests: Remove multiple suites per job for Weekly tests (#562)
I believe the weekly test failures (example:
https://github.com/gem5/gem5/actions/runs/6832805510/job/18592876184)
are due to a container running out of memory when running the very-long
x86 boot tests. I found that the `-t $(nproc)` flag meant, on our
runners, 4 x86 full system gem5 simulations were being pawned. Locally I
found these gem5 x86 boot sims can reach 4GB in size so I suspect they
eventually grew big enough exceed the 16GB memory of the VM.

I have removed `-t $(nproc)` meaning each execution to see if this fixes
the issue (we may want to use `-t 2` later if the Weeklies take too long
running single-threaded).
2023-11-14 11:00:07 -08:00
Bobby R. Bruce
8859592893 tests,gpu-compute: Fix Lulesh 'Obtain LULESH' step (#563)
The `working-directory: ${{ github.workspace }}` line was included by
mistake and resulted in this step failing as the command was being
executed in the wrong directory.

Example failure:
https://github.com/gem5/gem5/actions/runs/6832831307/job/18593080567
2023-11-14 08:43:00 -08:00
Derek Christ
e95cab429f configs,ext,stdlib: Update DRAMSys integration (#525)
Recent breaking changes in the DRAMSys API require user code to be
updated. These updates have been applied to the gem5 integration.

Furthermore, as DRAMSys started to use CMake dependency management,
it is no longer sensible to maintain two separate build systems for
DRAMSys. The use of the DRAMSys integration in gem5 will therefore
from now on require that CMake is installed on the target machine.

Additionally, support for snapshots have been implemented into DRAMSys
and coupled with gem5's checkpointing API.
2023-11-14 08:05:11 -08:00
Derek Christ
99553fdbee systemc: Fix two bugs in gem5-to-tlm bridge (#542)
This commit fixes a violation of the TLM2.0 protocol as well as a
bug regarding back-pressure:
- In the BEGIN_REQ phase, transaction objects are required to set
  their response status to TLM_INCOMPLETE_RESPONSE. This was not
  the case in the packet2payload function that converts gem5 packets
  to TLM2.0 generic payloads.
- When the target applies back-pressure to the initiator, an assert
  condition was triggered as soon as the response is retried. The
  cause of this was an unintentional nullptr-access into a map.
2023-11-14 08:02:58 -08:00
BujSet
65b44e6516 mem-ruby: Fix for not creating log entries on atomic no return requests (#546)
Augmenting Datablock and WriteMask to support optional arg to
distinguish between return and no return. In the case of atomic no
return requests, log entries should not be created when performing the
atomic.

Change-Id: Ic3112834742f4058a7aa155d25ccc4c014b60199a
2023-11-14 07:54:42 -08:00
Daniel Kouchekinia
be5c03ea9f mem-ruby,configs: Add GPU GLC Atomic Resource Constraints (#120)
Added a resource constraint, AtomicALUOperation, to GLC atomics
performed in the TCC.

The resource constraint uses a new class, ALUFreeList array. The class
assumes the following:
  - There are a fixed number of atomic ALU pipelines
- While a new cache line can be processed in each pipeline each cycle,
if a cache line is currently going through a pipeline, it can't be
processed again until it's finished

Two configuration parameters have been used to tune this behavior:
- tcc-num-atomic-alus corresponds to the number of atomic ALU pipelines
- atomic-alu-latency corresponds to the latency of atomic ALU pipelines

Change-Id: I25bdde7dafc3877590bb6536efdf57b8c540a939
2023-11-14 07:48:48 -08:00
Nikolaos Kyparissas
38045d7a25 mem-cache: Added clean eviction check for prefetchers.
pkt->req->isCacheMaintenance() would not include a check
for clean eviction before notifying the prefetcher,
causing gem5 to crash.

Change-Id: I1d082a87a3908b1ed46c5d632d45d8b09950b382
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-11-14 15:20:52 +00:00
Richard Cooper
6416304e07 mem-cache: Update default prefetch options.
Update the default prefetch options to achieve out-of-the box
prefetcher performance closer to that which a typical user would
expect. Configurations that set these parameters explicitly will be
unaffected.

The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.

Change-Id: Id63868c7c8f00ee15a0b09a6550780a45ae67e55
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-11-14 15:20:52 +00:00
Richard Cooper
8598764a03 mem-cache: Squash prefetch queue entries by block address.
Prefetch queue entries were being squashed by comparing the address
of each queued prefetch against the block address of the demand
access. Only prefetches that happen to fall on a cache-line block
boundary would be squashed.

This patch converts the prefetch addresses to block addresses before
comparison.

Change-Id: I55ecb4919e94ad314b91c7795bba257c550b1528
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-11-14 15:20:52 +00:00
Yu-Cheng Chang
f11227b4a0 systemc: Fix gcc13 systemC compilation error (#520)
issue: https://github.com/gem5/gem5/issues/472
2023-11-14 03:54:35 -08:00
Bobby R. Bruce
6ac6d0c340 tests,misc: Add "build/ALL/gem5.fast" Clang compilation to CI (#432)
While we do run compiler tests weekly, 9/10 the issue is a strict check
in clang we did not check before incorporating code into the codebase.

Therefore, running a clang compilation as part of our CI would help us
catch errors quicker.
2023-11-14 03:53:28 -08:00
Daniel Kouchekinia
dde3d10aea cpu: Remove SLC bit restraint for GPU tester (#552)
This reverts gem5#133, the temporary work-around for gem5#131, allowing
both SLC and GLC atomic requests to be made in the GPU tester.

The underlying issues behind gem5#131 have been resolved by gem5#367 and
gem5#397.
2023-11-14 03:47:34 -08:00
Rajarshi Das
f71450d26d python,util: Fix magic number check in decode_inst_dep_trace.py (#560)
The decode_inst_dep_trace.py opens the trace file in read mode, and
subsequently reads the magic number from the trace file. Once the number
is read, it is compared against the string 'gem5' without decoding it
first. This causes the comparison to fail.
The fix addresses this by calling the decode() routine on the output of
the read() call. Please find the details in the associated issue #543
2023-11-14 03:47:04 -08:00
Bobby R. Bruce
1c7934c9d6 tests,util-docker: Remove gcc 9 support (#556)
When compiling GCC-9 gem5 the gem5 object files are near double the size
than when compiling with other GCC versions. This increase in size means
we need >16GB of memory available when linking. As we do not want to
mandate >16GB systems for building gem5, we are going to drop GCC-9. The
exact cause of this bug unknown. This is highlighted in Issue #555.
2023-11-14 03:45:51 -08:00
Matt Sinclair
48fde5a9c6 mem-ruby, gpu-compute: fix formatting of TCC (#536)
mem-ruby, gpu-compute: fix formatting of TCC

Fix several not properly indented prints and extraneous extra lines in
the SLICC code for the GPU TCC (L2 cache).
2023-11-13 15:01:30 -08:00
Matt Sinclair
7d0a1fb284 mem-ruby, gpu-compute: fix typo in GPU coalescer deadlock print (#535)
mem-ruby, gpu-compute: fix typo in GPU coalescer deadlock print 
 
The GPU Coalescer's deadlock print did not previously print a newline at
the end of each deadlock, which caused confusion when there were
multiple deadlocks as each deadlock print would appear to go with the
address after it. This patch fixes this issue.
2023-11-13 15:01:01 -08:00