The interrupt handler's base address is sent via MMIO and must be
shifted by 8 bits to convert to a byte address. The current code is
shifting the MMIO dword first then assigning, resulting in the top 8
bits being shifted out.
This changeset fixes the issue by assigning the dword to the 64-bit
address first then shifting after. Similarly, the upper dword is cast to
a 64-bit value first before shifting.
This fixes some "fence fallback timeout" errors in the m5term output.
These timeouts become a problem because the driver will reset after a
few hundred of them, killing any running GPU applications as part of the
process.
Change-Id: I0beec313f533765c94063bcf4de8c65aacf2986b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65092
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The GART table is a legacy 1-level page table primarily used for
supervisor mode accesses to GPUs. The PTE size is 64-bits, not 32-bit.
This causes memory sizes >3GB (in X86) to fail loading amdgpu driver.
This changeset fixes the issue by setting the GART mappings to the
correct data type.
Change-Id: Ibfba2443675fe28316d26afa5f1a14885fdce40c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65091
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The current implementation of SDMA copy calls the GPU memory manager's
read/write method one time passing a physical address as the
source/destination. This implicitly assumes the physical addresses are
contiguous which is generally not true for large allocations. This
results in reading from/writing to the wrong address.
This changeset fixes the problem by copying large copies in chunks of
the minimum possible page size on the GPU (4kB). Each page is translated
seperately to ensure the correct physical address. The final copy "done"
callback is only used for the last transfer. The transfers should
complete in order so the copy command will not complete until all chunks
have been copied. Tested and verified on an application with a large
allocation (~5GB).
Change-Id: I27018a963da7133f5e49dec13b0475c3637c8765
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64752
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Vega allows for any integer multiple of 4kB pages. However, the current
implementation is designed for 4kB page primarily. In order to support
variable page sizes, the physical address calculation needs to be
updated to add the virtual page offset to the base physical address
rather than bitwise-OR. Bitwise-OR assumes physical pages are at
aligned to the page size which is generally not the case for very
large pages (1GB+).
This changeset changes all of the physical address computations to add
the virtual offset to the physical page address. This fixes many GPUFS
applications which use larger pages. The support was tested by
hipMalloc'ing ~5GB to induce a large page being created. The test
application now passes verification with this change.
Change-Id: Ic8d1475e001def443f3e4ab609449bca0c40b638
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64751
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
It never made much sense to set checkpoint via the Simulator module as
Checkpoints are very tightly coupled with the Workload being run. This
change therefore moves the checkpoint to the set_workload functions.
Setting checkpoints via the Simulator is deprecated and will be removed
in a future release.
Change-Id: I24d2133b38a86423d3553ec888c917c5fe47b93d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64571
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Some CPU wrappers like the Fastmodel one do extend the
ThreadContext interface in order to retrieve system register
state... By bypassing the TC interface and by using the ISA
instead, we are basically forcing users to extend the ISA
as well to intercept these calls.
So with this patch we are making sure every system register is accessed
(like HCR_EL2 or SCR_EL3) through the thread context. This of course
does not apply to the CPU interface registers as we still use the ISA
storage for them. In the future we should probably move that storage
from the ISA class to the Gicv3CPUInterface class itself
This is also simplifying Gicv3CPUInterface::isEL3OrMon:
currEL already covers the AArch32 case so no need to
differentiate between AArch32 and AArch64
Change-Id: I446a14a6e12b77e1a62040b3422f79ae52cc9eec
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64913
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
This patch is adding an outfile parameter to the TarmacTracer
This has 3 options:
1) stdoutput = dump to standard output (default behaviour)
2) stderror = dump to standard error
3) file = dump to a file. As there is one tracer per CPU,
this means every CPU will dump its trace to a different file,
named after the tracer name (e.g. cpu0.tracer, cpu1.tracer)
It is still possible to redirect to a file with option 1 and 2
thanks to common bash redirection. What the third option is
really buying us is the capability to dump CPU traces on
separate files, and to separate the trace output from the debug-flag
output
Change-Id: Icd2bcc721f8598d494c9efabdf5e092666ebdece
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63892
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
During gem5 build, the compiler may produce some large intermediate
files. The default path is /tmp, but in some usecase, it's under a small
file system, and we may want to change the storage path to a sufficient
large file system. This CL captures TMPDIR environment variables, to
allow users change the default temporary directory.
Change-Id: Ib3fad301f36df9f3f08eb9b6cfeb4df1b7df5d1a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64873
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The previous SimPoint warmup length was limited by the gaps between
the starting instruction of one SimPoint and the ending instruction of
the SimPoint before it. This was to prevent duplicate SimPoints, but it
can significantly limit the warmup length.
In this commit, the warmup length limitation will be extended to the
starting instruction of one SimPoint regardless of the gap between
SimPoints.
A SimPoint checkpoint generator is created to help taking checkpoints
for SimPoints and make sure multiple SimPoint checkpoints are taken
when there are multiple SimPoints sharing the same starting instruction
Change-Id: If95f6813e8cbf5c01e41135c1b1bb91ed2e950ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64351
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
For SVE it is possible to override the run-time vector length (VL) per
exception level by setting the value in the appropriate ZCR_ELx
registers. In general instructions query the appropriate registers
during execute() to determine the actual vector length. The exception
to this rule are the SVE Macromem instructions which use the VL to
determine the number of micro-ops to crack into during
decode. However, as there is no available ExecContext during the
decode stage these instructions rely on a cached value stored in the
decoder.
Previously we were updating the VL in the decoder using potentially
stale values of ZCR_ELx by calling the update before actually setting
the registers themselves. We now set the registers before updating the
decoder.
Change-Id: I0167095699f7f950ee99fc42c7c8292fe8938d28
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64331
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
As of https://gem5-review.googlesource.com/c/public/gem5/+/64177 we
support version 22.04. This patch therefore updates the testing
infrastructure (kokoro/quick, nightly/long, weekly/extra-long) to use
the Ubuntu 22.04 docker image.
The "jenkins/gem5art-tests.sh" test script has been updated to no longer
require the `pip upgrade`. This was needed for Ubuntu 20.04 as it
utilized an older version of pip which did not have all the dependencies
these tests requried. As of Ubuntu 20.04 this is no longer required.
Change-Id: Ia8f8b1b2c62ad5d5a8419cb31b6a1d2b6dff7ac9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64291
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
This change makes the default vendor string AuthenticAMD.
GLIBC now is much more strict about checking for the current system's
supported features. In Ubuntu 22.04, when trying to load a dynamically
linked file, the CPUID is checked for the required features. If they are
not there, an error saying ISA level too low is returned and the program
crashes.
The underlying issue is that GLIBC does not check and populate the
cpu_feature data structure if it does not detect a *known* CPU model.
The options are hardcoded. See the following file for the glibc code.
glibc/sysdeps/x86/cpu-features.c
Note that the cpu_features is not populated with the
COMMON_CPUID_INDEX_1 unless there is a known family, which is only set
if the vendor string matches a known vendor.
This change uses AuthenticAMD instead of the alternatives because the
checks in glibc are most simple (no special cases) for AuthenticAMD in
the init_cpu_features functions.
GLIBC has been unable to populate the cpu_features datastructure
correctly with gem5 for a long time. However, this has just now become a
problem for us because the library now is more strict on not allowing
code to execute unless the processor meets certain minimum requirements.
I believe the commit for GLIBC which caused this breakage is
ecce11aa0752735c4fd730da6e7c9e0b98e12fb8
See https://sourceware.org/pipermail/binutils/2020-October/113593.html
for more details on that commit.
Change-Id: I8eedb46f577361e749ad8d0adda4fd0753e99960
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64831
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
The current MESI_Two_Level protocol's L1 caches updates the MRU information twice per request on misses -- once when the request reaches Ruby and once when the miss is returned from another level of the memory hierarchy.
Although this approach does not cause any correctness bugs for replacement policies like LRU since this request is the LRU in both cases, it does not work correctly for other policies like SecondChance and LFU, where updating the information twice (for misses) causes them to devolve to LRU.
Note that this was not directly a problem with Ruby previously, because it only supported LRU-based policies that were unaffected by this. However, with the integration of 20879 Ruby now uses the same replacement policies as Classic (which has additional, non-LRU based replacement policies).
This patch resolves this problem by not updating the MRU information a second time for the misses. It has been tested and validated with the replacement policy tests.
Change-Id: I9e7e96a9d6c09f3d6b7daae7115ef091ac3bdc08
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64371
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
In the case where no regex was specified (i,e., `regex == None`), the
`_iterable_regex` function returned `None`. For the testlib to work this
function must return an iterable.
Without this patch it is not possible to compare two files without a
specifying a regex.
Change-Id: Ibc8a2f783c3b786dbdca6a4284850b594024f2f9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64851
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
In python, when you use -c it consumes all subsequent parameters and
appends them to argv. Now, gem5 and python behave the same with -c.
Python:
> python -c "import sys; print(sys.argv)" --hello -j
['-c', '--hello', '-j']
gem5:
> gem5.opt -c "import sys; print(sys.argv)" --hello -j
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version [DEVELOP-FOR-22.1]
gem5 compiled Oct 17 2022 15:47:46
gem5 started Oct 17 2022 15:53:45
gem5 executing on challenger, pid 4021103
command line: build/ALL/gem5.opt -c 'import sys; print(sys.argv)' --hello -j
['-c', '--hello', '-j']
Change-Id: I53e87712be9523e0583149235c9787c92618f884
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63151
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
The conversion logic between gem5 register id and iris resouce id is
duplicated in read and write function. Some of them also doesn't handle
the invalid id correctly. This change wraps the logic, fixes them, and
improves the debug messages by printing the register names.
Change-Id: I093d05f5f06d804d5f01988c2a7ffa60244c5516
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64651
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
The current protoc_action generates declarations that impose the requirement
for .proto files to import from the BUILDDIR. This prevents .proto files to
import themselves relative to their own directory.
For example, if there are two files build/pkg/a.proto and build/pkg/b.proto,
and b.proto imports a.proto, it must do so as "import pkg/a.proto", as opposed
to "import a.proto"; otherwise, the generated declaration will give rise to a
compilation error.
This is a problem for EXTRAS where .proto files are gem5-independent, so
they cannot import from gem5's build directory. An example is
https://github.com/ARM-software/ATP-Engine
This commit changes the protoc_action so that declarations are generated
relative to the SOURCE directory. This enables .proto files to import
other .proto files within the same directory.
Note it is not possible to import other .proto files in different
directories, but support for it is not currently necessary.
More details on the proto path:
https://developers.google.com/protocol-buffers/docs/reference/cpp-generated#invocation
Related:
https://gem5-review.googlesource.com/c/public/gem5/+/55903
Change-Id: Ib3e67ae817f8ad0b6803c90d23469267eff16178
Signed-off-by: Adrián Herrera Arcila <adrian.herrera@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64491
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This docker image contains ALL/gem5.fast compiled using the 22.04
min-dependencies image.
This image is available at gcr.io/gem5-test/gem5-all-min-dependencies:
```
docker pull gcr.io/gem5-test/gem5-all-min-dependencies
```
Change-Id: I0af4a629e7082df1d76a8459ebfc4fb0a91e2855
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64431
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This debug flag is used to print spammy SDMA DPRINTFs, such as an SDMA
copy printing the data of large transfers 8 bytes per line at a time. For
those prints, the SDMAEngine flag will now only print the first and last
qword of the transfer and the new SDMAData flag is needed for verbose
data printing. This makes the SDMAEngine flag still useful for verifying
copies in applications with predictable data such as square.
Additionally, the memory allocation/deallocation done solely for a print
statement is removed in favor of casting the data to the printed type.
Change-Id: I18c1918ef9085cca4570f79881ee63d510ccc32f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64452
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Page directory entries (PDEs) can be interpreted as leaf node page
table entries (PTEs) if the "p" bit is set. This is used for flexible
page sizes in Vega. Currently there is only support for PDE level 0
entries which can be interpreted as 2MB pages. This changeset adds
support for PDE1 and PDE2 which can be used to represent 1GB and 512GB
pages. PDE1-as-PTE entries can be tested and were verified on
applications by allocating >2GB of data. PDE0 is untested due to being
too large for simulation, but the implementation is similar to PDE0
and PDE1.
Change-Id: I801cbb5ec79110d57d2db760cc689c2e5778f9bb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64451
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>