Starting with ROCm 5.4+, MI100 and MI200 make use of the translate
further bit in the page table. This bit enables mixing 4kiB and 2MiB
pages and is functionally equivalent to mixing page sizes using the
PDE.P bit for which gem5 currently has support.
With PDE.P bit set, we stop walking and the page size is equal to the
level in the page table we stopped at. For example, stopping at level
2 would be a 1GiB page, stopping at level 3 would be a 2MiB page.
This assumes most pages are 4kiB.
When the F bit is used, it is assumed most pages are 2MiB and we will
stop walking at the 3rd level of the page table unless the F bit is set.
When the F bit is set, the 2nd level PDE contains a block fragment size
representing the page size of the next PDE in the form of 2^(12+size).
If the next page has the F bit set we continue walking to the 4th level.
The block fragment size is hardcoded to 9 in the driver therefore we
assert that the block fragment size must be 0 or 9.
This enables MI200 with ROCm 5.4+ in gem5. This functionality was
determine by examining the driver source code in Linux and there is no
public documentation about this feature or why the change is made in or
around ROCm 5.4.
Change-Id: I603c0208cd9e821f7ad6eeb1d94ae15eaa146fb9
This SDMA packet is much more common starting around ROCm 5.4.
Previously this was mostly used to clear page tables after an
application ended and was therefore left unimplemented. It is
now used for basic operation like device memsets.
This patch implements constant fill as it is now necessary.
Change-Id: I9b2cf076ec17f5ed07c20bb820e7db0c082bbfbc
When using the new operator, delete should be called
on any allocated memory after it's use is complete.
Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.
Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a temporary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.
As a result of this, as more VOP2 instructions are implemented using
this helper, they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.
Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
When using the new operator, delete should be called
on any allocated memory after it's use is complete.
Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
In "src/cpu/testers/gpu_ruby_test" a random number generator was used.
This was using the CPP "<random>" library. This patch changes it to the
gem5 random class (that declared in "base/random.hh").
In addition to this, undeterministic behavior has been removed. Via
"protocol_tester.cc" the RNG is either seeded with a seed specified by
the user, or goes with the gem5 default seed. This ensures reproducable
runs. Prior to this patch the RNG was seeded with `time(NULL)`. This
made finding faults difficult.
This, at least partially, addresses Issue #138
Change-Id: Ia8e9f7b87e91323f828e0b7f6c3906c0c5793b2c
In "src/cpu/testers/gpu_ruby_test" a random number generator was used.
This was using the CPP "<random>" library. This patch changes it to the
gem5 random class (that declared in "base/random.hh").
In addition to this, undeterministic behavior has been removed. Via
"protocol_tester.cc" the RNG is either seeded with a seed specified by
the user, or goes with the gem5 default seed. This ensures reproducable
runs. Prior to this patch the RNG was seeded with `time(NULL)`. This
made finding faults difficult.
Change-Id: Ia8e9f7b87e91323f828e0b7f6c3906c0c5793b2c
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.
Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a tempoary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.
As a result of this, as more VOP2 instructions are implemented using
this helper,they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.
Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
The extended state CPUID function is used to set the values of the XCR0
register as well as specify the size of storage for context switching
storage for x87 and AVX+. This function is iterative and therefore
requires (1) marking it as such in the hsaSignificantIndex function (2)
setting multiple sets of 4-tuples for the default CPUID values where the
last 4-tuple ends with all zeros.
Change-Id: Ib6a43925afb1cae75f61d8acff52a3cc26ce17c8
Related to the recent changes with moving CPUID values to python, this
value is needed to enable AVX and needs a way to be exposed to python as
well in order to set the bit and the corresponding CPUID values at the
same time.
Change-Id: I3cadb0fe61ff4ebf6de903018a8d8a411bfdb4e0
Various CPUID functions will return different values depending on the
value of ECX when executing the CPUID instruction. Add support for this
in the X86 KVM CPU. A subsequent patch will add a CPUID function which
requires iterating through multiple ECX values.
Change-Id: Ib44a52be52ea632d5e2cee3fb2ca390b60a7202a
CPUID values for X86 are currently hard-coded in the C++ source file.
This makes it difficult to configure the bits if needed. Move these to
python instead. This will provide a few benefits:
1. We can enable features for certain configurations, for example AVX
can be enabled when the KVM CPU is used, but otherwise should not be
enabled as gem5 does not have full AVX support.
2. We can more accurately communicate things like cache/TLB sizes based
on the actual gem5 configuration. The CPUID values are can be used by
some libraries, e.g., MPI, to query system topology.
3. Enabling some bits breaks things in certain configurations and this
can be prevented by configuring in python. For example, enabling AVX
seems to currently be breaking SMP, meaning gem5 can only boot one CPU
in that configuration.
Change-Id: Ib3866f39c86d61374b9451e60b119a3155575884
This fixes#131 by reverting to the old behavior of performing all
atomics at the system level. To do this the SLC bit needs to be set for
all atomic requests.
Change-Id: I63f4e449be1b02c933832d09700237f8c8026f4c
In the memory controller, MemCtrl::MemoryPort::recvFunctional,
when the functional request is satisfied by the ctrl-response queue,
correctly make the packet a response.
This change mirrors AbstractMemory::functionalAccess, which uses
Packet::makeResponse() after satisfying the request.
Change-Id: I47917062d3270915a97eed2c9fade66ba17019eb
In https://gem5-review.googlesource.com/c/public/gem5/+/52047 inst.pc
was changed from an object to a pointer. It is possible that this
pointer is null (e.g., if there is an interrupt and there is a bubble).
Make sure to check that it's not null before printing.
I believe that other places this pointer is dereferenced without an
explicit null check are safe, but I'm not certain.
Should fix#97
Change-Id: Idbe246cfdb62d4d75416d41b451fb3c076233bbc
When compiling with clang-14 I received the following error:
```
src/base/bitfield.hh:328:1: error: function 'findLsbSetFallback' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
```
This function was introduced in PR #76.
This fixes this compiler warning/error by using `[[maybe_unused]]`.
Change-Id: I0b99eab0a9e42ee1687e7a0594a5a7bf9588b422
FEAT_TLBIOS has been introduced by a recent patch [1] which
was however missing to include the outer shareable case in the
Msr disambiguation switch. Which meant the TLBIOS instructions
were decoded as normal MSR instructions, with no effect whatsoever
on the TLBs
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/70567
Change-Id: I41665a4634fbe0ee8cc30dbc5d88d63103082ae9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Added a GLC atomic latency parameter (glc-atomic-latency) used when
enqueueing response messages regarding atomics directly performed in
the TCC. This latency is added in addition to the L2 response latency
(TCC_latency). This represents the latency of performing an atomic
within the L2.
With this change, the TCC response queue will receive enqueues with
varying latencies as GLC atomic responses will have this added GLC
atomic latency while data responses will not. To accommodate this in
light of the queue having strict FIFO ordering (which would be violated
here), this change also adds an optional parameter bypassStrictFIFO to
the SLICC enqueue function which allows overriding strict FIFO
requirements for individual messages on a case-by-case basis. This
parameter is only being used in the TCC's atomic response enqueue call.
Change-Id: Iabd52cbd2c0cc385c1fb3fe7bcd0cc64bdb40aac
* base: Enable stl_helpers::operator<< in _formatString
The string format (%s) eventually relies on bare operator<< to
display any type T. This gives the opportunity to use the helpers in
stl_helpers. This patch enables printing enums, pairs, tuples,
vectors, maps and others in a PRINTF debug macro without any extra
manual operation.
Change-Id: I8ac85133ebadcb95354598c1cfe687d8fffb89e2
* base: Add Printer util class to force use of operator<< helpers
Wrapping any value in a Printer instance before using operator<< will
force the use of stl_helpers::operator<<.
Change-Id: I7b505194eeabc3e0721effd9b5ce98f9e151b807
* base: Fix typo in ostream_helpers.hh
Change-Id: I283a5414f3add4f18649b77153dcbcc8661bc81e
* base: Disambiguate null optional representation in ostream helper
Change-Id: I5b093555688566cc405248d3a448a8f3efa67888
* base: Add unit test for std::optional ostream helper
Change-Id: I6fb9ced5e6461de5685638a162b5534e10710e20
* base: Ostream helpers Printer unit test
Change-Id: I11db89e85fd40c12bceecb41cadee78b8e871d7b
* base: Unit test for ostream helpers for pointers and smart ptr
Change-Id: Ifa87e8b69fdd9a4869250ab40311f352e8f54ed9
* base: Coding style fix in ostream_helpers.test.cc
Change-Id: I095c7048fad35e63f979aa601bfc8cde65c9077b
* base: Test shared_ptr in ostream_helpers.test.cc
Change-Id: I553df0614f1dd6eef2061c4dc1794af8c543b78f
---------
Co-authored-by: Gabriel Busnot <gabriel.busnot@arteris.com>
* stdlib,configs,tests: Remove `Resource` class use
This class is deprecated, but was still used in various example
configuration scriots and tests. This patch replaces it with the
`obtain_resource` function.
Change-Id: I0c89bf17783ccaaafc18072aaeefb5d1e207bc55
* configs: Remove `CustomDiskImageResource` use
The class is deprecated but was still used in the SPEC example scripts.
This patch replaces it with the `DiskImageResource` class.
Change-Id: Ie0697fe59a3d737b05eb45ff3bc964f42b0387e0
* configs,tests: Remove `CustomResource` use
This class is deprecated but was still used in example scripts and
mentioned, incorrectly, in comments in the pyunit tests. This patch
removes these.
Change-Id: Icb6d02f47a5b72cd58551e5dcd59cc72d6a91a01
* stdlib: Remove '\' in Workload docstring example
This example shows how to use the Workload. The backslash is not correct Python and would fail if used in this way.
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
---------
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
When compiling with clang-14 I received the following error:
```
src/base/bitfield.hh:328:1: error: function 'findLsbSetFallback' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
```
This function was introduced in PR #76.
This fixes this compiler warning/error by using `[[maybe_unused]]`.
Change-Id: I0b99eab0a9e42ee1687e7a0594a5a7bf9588b422
In https://gem5-review.googlesource.com/c/public/gem5/+/52047 inst.pc
was changed from an object to a pointer. It is possible that this
pointer is null (e.g., if there is an interrupt and there is a bubble).
Make sure to check that it's not null before printing.
I believe that other places this pointer is dereferenced without an
explicit null check are safe, but I'm not certain.
Change-Id: Idbe246cfdb62d4d75416d41b451fb3c076233bbc
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
* stdlib: Change resource compatibility warning
If the gem5 version is "develop", the warning will not
be thrown.
Change-Id: Id2be1c4323c6ca06c5503c2885c1608f8d119420
* stdlib: Change resource compatibility warning
If the gem5 version is "develop", the warning will not
be thrown.
Change-Id: Id2be1c4323c6ca06c5503c2885c1608f8d119420
* tests: Edit obtain_resources warning test
Since we are editing the warning message for
the develop branch, the test removes the
warning message as well.
Change-Id: I90882340188360bb3435344cdc14b324412c6c0e
---------
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
* cpu-kvm: Add a variable signifying whether we are using perf
Change-Id: Iaa081e364f85c863f781723b5524d267724ed0e4
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* cpu-kvm: Making it clear the functionalities are specific to KVM
Change-Id: I982426f294d90655227dc15337bf73c42a260ded
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* cpu-kvm: Make perf optional
Change-Id: I8973c2a96575383976cea7ca3fda478f83e95c3f
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* configs: Add an example config of using KVM without perf
Change-Id: Ic69fa7dac4f1a2c8fe23712b0fa77b5b22c5f2df
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* Apply suggestions from code review
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
* misc: Add an example to the panic
Change-Id: Ic1fdfb955e5d8b9ad1d4f0a2bf30fa8050deba70
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* misc: Add warning of not using perf when using KVM CPU
Change-Id: I96c0832fb48c63a79773665ca6228da778ef0497
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* misc: Fix stuff
Change-Id: Ib407ae7407955b695f0e0f2718324f41bb0d768f
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* misc: style fix
Change-Id: I7275942e43f46140fdd52c975f76abb3c81b8b0a
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
---------
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Added support for performing non-SLC-set atomics in the TCC.
Previously, all atomics were being passed on by the TCC to the
directory. With this change, atomics will only be passed on if the SLC
bit is set or if the line isn't present or available in the TCC.
If a non-SLC atomic is passed on to the directory because it is not
present in the TCC, the atomic will be performed on the return path on
the Data event. To accommodate the directory not performing the atomic
in this case, this change also passes the SLC bit on to the directory.
The previously-named "Atomic" action has been renamed to
"AtomicPassOn", with the new "Atomic" corresponding to an atomic
performed directly in the TCC.
Change-Id: Ibf92f71ddceb38bd1b0da70b0a786cc4c3cf2669
In the previous version of gem5, the source files of extra directories
will copy to build directory for compilation. It will not be a problem
if the extra directories include *.h(*.hh) from the other extra
directories.
After the patch applied from the change
(https://gem5-review.googlesource.com/c/public/gem5/+/68758). The
source files of extra directories will not copy to the build directory
unless the user compiles gem5 with "--duplicate-sources". It will
cause the compilation error if the code includes a header file from
other repositories.
For example, assume we want to compile gem5 with "foo/bar1" and
"foo/bar2" repositories and they are gem5-independent. There are some
header files in "foo/bar1/a.h" "foo/bar1/b.h" and "foo/bar2/d.h". If
the code "foo/bar1/sample.c" tries to include the file "foo/bar2/d.h".
They usually include the file by declare "#include bar2/d.h" in
foo/bar1/sample.c. It can work if --duplicate-sources is specified in
gem5 build because they will copy to <builddir>/bar1 and
<builddir>/bar2 respectively, and -I<builddir> is specified by default
whether duplicate_sources or not. It will raise the compilation error
if the user does not specify it.
The change is aimed to let the situation work without
duplicate-sources specified by adding parent extra directory, and
adding them before the extra directories. If the --duplicate-sources
specified, it will not add parent extra directories to avoid repeat
include paths.
Change-Id: I461e1dcb8266d785f1f38eeff77f9d515d47c03d
* base: Fix Memoizer constructor parameter type
* base: switch from new to mk_unq in amo.test.cc
* base: Fix memory management in IniFile
* base: Fix memory management in Trie
* sim: Fix out-of-bounds access in CheckpointIn::setDir
Change-Id: Iac50bbf01b6d7acc458c786da8ac371582a4ce09
---------
Co-authored-by: Gabriel Busnot <gabriel.busnot@arteris.com>
Added dummy definition of __has_builtin to bitfield.hh's hasBuiltinCtz,
which is already being done in popCount.
Change-Id: I4a1760a142209462bb807c6df4bc868284b6f5f3
* base: Generalize findLsbSet to std::bitset<N>
* base: Split builtin and fallback implementations of findLsbSet
* base: Add more unit testing for findLsbSet
Change-Id: Id75dfb7d306c9a8228fa893798b1b867137465a9
---------
Co-authored-by: Gabriel Busnot <gabriel.busnot@arteris.com>
* misc: Update README to README.md
This change converts the text-based README to markdown. This works
better with modern source-control systems, most notably, GitHub.
The README.md has been broken down into sections to better organize the
document.
This section now included expanded information on Reporting bugs and
Requesting Features.
Due to renaming 'README' to 'README.md', this code was generating the
following for "info.py":
```
README.md = "<FILE CONTENTS HERE>"
```
As '.' is used to access member variables/methods in python. To fix this
"infopy.oy" now replaces "." with "_". As such the generated in in
"info.py" is now:
```
README_MD = "<FILE CONTENTS HERE>"
This puts GitHub Discussions and GitHub Issues towards the top of the
list. This is to incentivize their usage.
Change-Id: I18018ba23493f43861544497f23ec59f1e8debe1
---------
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Latest protobuf library depends on abseil libraries. We should rely on
pkgconfig to give us correct dependency. We still keep the old check as
fallback.
Change-Id: I529ea1f61e5bbc16b2520ab1badff3d8264f1c33
AMD GCN3 and Vega GPUs assume a max of 16 WG/CU. Any GPU WG with more
than 1 WF requires a hardware barrier to allow WFs in the WG to
synchronize locally. However, currently the default gem5 GPU
configuration assumes only 4 barriers per CU, which artificially
prevents applications with > 4 WG/CU that could run simultaneously
from running simultaneously.
This fix resolves this by updating the default number of hardware barriers
per CU to 16, which mimics the support described in slide 39 here:
https://www.olcf.ornl.gov/wp-content/uploads/2019/10/
ORNL_Application_Readiness_Workshop-AMD_GPU_Basics.pdf
Change-Id: Ib7636a13359d998e676c1790f436a83ce88cbfc0
This change adds a new file to m5out which is citations.bib.
This file will contain the citations to the papers which describe the
aspects of the gem5 simulator that the simulation uses. In other words,
each simulation configuration could generate a different bib file
referencing different works.
Each SimObject can now have a set of citations associated with it. After
the system is built (in `instantiate`), the citations.bib file is
created by parsing all SimObjects that have been instantiated and taking
the union of their associated citations.
This commit is not meant to add all citations, but to act as an example
for others to add more citations to gem5.
Change-Id: Icd5c46fd9ee44adbeec1fea162657f5716f7e5ef
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Added WIB (Waiting on Writethrough Ack; Will be Bypassed) state which
is transitioned to when a dirty line in the TCC is evicted in a
bypassed read. Previously, we were transitioning to invalid.
While a WI (Waiting on Writethrough Ack) state exists, transitions from
it on WBAck deallocates the TBE, which contains SLC bit information
needed to trigger the Bypass event when the read response from the
directory comes in.
Without this change, WB acknowledgements from the directory in read
bypass evicts (with the SLC bit set) were being treated as if they were
read responses, leading to an invalid transition panic.
Change-Id: I703c3fe8af0366856552bb677810cb1a8f2896de
This patch changes the way memory ranges are devided when using
multiple cores for linear traffic. The current state assigns the
same range to multiple linear generators so all the cores start
generating the same trace. This patch devides the overall range
assigned to the generator ([min_addr:max_addr]) between the cores.
Change-Id: I49f69b3d61b590899f8d54ee3be997ad22d7fa9b
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: mkjost0 <50555529+mkjost0@users.noreply.github.com>
Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
When shiftAmt is 0 for a UQRSHL instruction, the code called bits() with
incorrect arguments. This fixes a left-shift of 0 to be a NOP/mov, as
required.
Change-Id: Ic86ca40ac42bfb767a09e8c65a53cec56382a008
Co-authored-by: Marton Erdos <marton.erdos@arm.com>