Added WIB (Waiting on Writethrough Ack; Will be Bypassed) state which
is transitioned to when a dirty line in the TCC is evicted in a
bypassed read. Previously, we were transitioning to invalid.
While a WI (Waiting on Writethrough Ack) state exists, transitions from
it on WBAck deallocates the TBE, which contains SLC bit information
needed to trigger the Bypass event when the read response from the
directory comes in.
Without this change, WB acknowledgements from the directory in read
bypass evicts (with the SLC bit set) were being treated as if they were
read responses, leading to an invalid transition panic.
Change-Id: I703c3fe8af0366856552bb677810cb1a8f2896de
This patch changes the way memory ranges are devided when using
multiple cores for linear traffic. The current state assigns the
same range to multiple linear generators so all the cores start
generating the same trace. This patch devides the overall range
assigned to the generator ([min_addr:max_addr]) between the cores.
Change-Id: I49f69b3d61b590899f8d54ee3be997ad22d7fa9b
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: mkjost0 <50555529+mkjost0@users.noreply.github.com>
Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
When shiftAmt is 0 for a UQRSHL instruction, the code called bits() with
incorrect arguments. This fixes a left-shift of 0 to be a NOP/mov, as
required.
Change-Id: Ic86ca40ac42bfb767a09e8c65a53cec56382a008
Co-authored-by: Marton Erdos <marton.erdos@arm.com>
* gpu-compute: Remove use of 'std::random_shuffle'
This was deprecated in C++14 and removed in C++17. This has been
replaced with std::random. This has been implemented to ensure
reproducible results despite (pseudo)random behavior.
Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116
* dev-amdgpu: Add missing 'overrides'
This causes warnings/errors in some compilers.
Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae
* dev: Fix Linux specific includes to be portable
This allows for compilation in non-linux systems (e.g., Mac OS).
Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2
* tests: Add 'VEGA_X86' build target to compiler-tests.sh
Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5
* tests: Add 'GCN3_X86' build target to compiler-tests.sh
Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123
This operator can be safely brought in scope when needed with "using
stl_helpers::operator<<".
In order to provide a specialization for operator<< with
stl_helpers-enabled types without loosing the hability to use it with
other types, a dual-dispatch mechanism is used. The only entry point
in the system is through a primary dispatch function that won't
resolve for non-helped types. Then, recursive calls go through the
secondary dispatch interface that sort between helped and non-helped
types. Helped typed will enter the system back through the primary
dispatch interface while other types will look for operator<< through
regular lookup, especially ADL.
Change-Id: I1609dd6e85e25764f393458d736ec228e025da32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67666
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Currently, gem5 suffers from several bugs related
to Python interpreter's locale encoding issues.
gem5 will crash when the working directory contains
Non-ASCII characters.
The reason is that Python 3.8+ introduces a new
interpreter startup sequence [1]. The startup
sequence consists of three phases:
1. Python core runtime preinitialization
2. Python core runtime initialization
3. Main interpreter configuration
Stage 1 determining the encodings used for system
interfaces.
However, gem5 doesn't preinitialize the Python
interpreter. Thus, the locale settings do not take
effect. This patch preinitialize the Python for
Python 3.8+.
Also, this patch avoid the use of `Py_SetProgramName`,
which is deprecated since Python 3.11[3].
[1] https://peps.python.org/pep-0432/
[2] https://peps.python.org/pep-0587/
[3] https://docs.python.org/3/c-api/init.html#c.Py_SetProgramName
Change-Id: I08a2ec6ab2b39a95ab194909932c8fc578c745ce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70898
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Roger Chang <rogerycchang@google.com>
The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.
The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and causes GPUFS to break with that CPU.
This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.
Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
(cherry picked from commit 079fc47dc2)
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72079
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
The unconditional exit event when a kernel completes that was added in
c644eae2dd is causing scripts that do not
ignore unknown exit events to end simulation prematurely. One such
script is the apu_se.py script used in SE mode GPU simulation. Make this
exit conditional to the parameter being set to a valid value to avoid
this problem.
Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72098
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These types include std::pair, std::tuple, all iterable types and any
composition of these. Convenience hash factory and computation
functions are also provided.
These functions are in the stl_helpers namespace and must not move to
::std which could cause undefined behaviour. This is because
specialization of std templates for std or native types (or
composition of these) is undefined behaviour. This inconvenience can't
be circumvented for generic code. Users are free to bring these hash
implementations to namespace std after specialization for their own
non-std and non-native types.
Change-Id: Ifd0f0b64e5421d5d44890eb25428cc9c53484eb3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67663
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
Prior to this patch, when a memory controller was failing at sending a
response to AbstractController, it would not wakeup until the next
request. This patch gives the opportunity to Ruby models to notify
memory response buffer dequeue so that AbstractController can send a
retry request if necessary.
A dequeueMemRspQueue function has been added AbstractController to
automate the dequeue+notify operation.
Note that models that don't notify AbstractController will continue
working as before.
Change-Id: I261bb4593c126208c98825e54f538638d818d16b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67658
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
When you build Python from scratch, the modules would be separated
shared libraries. They would be dlopen when doing module import. To make
the separated shared libraries can share the symbol in the binary, we
should add -rdynamic when compliing.
Change-Id: I26bf9fd7ea5068fd2d08c8f059b37ff34073e8c2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72040
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Vega adds multiple new D16 instructions which load a byte or short into
the lower or upper 16 bits of a register for packed math. The decoder
table has subDecode tables for FLAT instructions which represents 32
opcodes in each subDecode table. The subDecode table for opcodes 32-63
is missing so it is added here.
The opcode for V_SWAP_B32 is also off by one- In the ISA manual this
instruction is opcode 81, the instruction before is 79, and there is no
opcode 80, so the decoder entry is swapped with the invalid decoding
below it.
Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71899
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.
The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and causes GPUFS to break with that CPU.
This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.
Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
TracingExtension contains a stack recording the port names
passed through of the Packet. The target receiving the Packet
can dump out the whole path of this Packet for the debug purpose.
This mechanism can be enabled with the debug flag PortTrace.
Change-Id: Ic11e708b35fdddc4f4b786d91b35fd4def08948c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71538
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
* According to the manual, load reservations must be cleared on a
failed or a successful SC attempt.
* A load reservation can be arbitrarily large. The current
implementation was reserving something different than cacheBlockSize
which could lead to problems if snoop addresses are cache block
aligned. This patch implementation assumes a cacheBlock granularity.
* Load reservations should also be cleared on faults
Change-Id: I64513534710b5f269260fcb204f717801913e2f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71520
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
This patch includes several changes to the gem5 tools interface to the
gem5-resources infrastructure. These are:
* The old download and JSON query functions have been removed from the
downloader module. These functions were used for directly downloading
and inspecting the resource JSON file, hosted at
https://resources.gem5.org/resources. This information is now obtained
via `gem5.client`. If a resources JSON file is specified as a client,
it should conform to the new schema:
https//resources.gem5.org/gem5-resources-schema.json. The old schema
(pre-v23.0) is no longer valid. Tests have been updated to reflect
this change. Those which tested these old functions have been removed.
* Unused imports have been removed.
* For the resource query functions, and those tasked with obtaining the
resources, the parameter `gem5_version` has been added. In all cases
it does the same thing:
* It will filter results based on compatibility to the
`gem5_version` specified. If no resources are compatible the
latest version of that resource is chosen (though a warning is
thrown).
* By default it is set to the current gem5 version.
* It is optional. If `None`, this filtering functionality is not
carried out.
* Tests have been updated to fix the version to “develop” so the
they do not break between versions.
* The `gem5_version` parameters will filter using a logic which will
base compatibility on the specificity of the gem5-version specified in
a resource’s data. If a resource has a compatible gem5-version of
“v18.4” it will be compatible with any minor/hotfix version within the
v18.4 release (this can be seen as matching on “v18.4.*.*”.) Likewise,
if a resource has a compatible gem5-version of “v18.4.1” then it’s
only compatible with the v18.4.1 release but any of it’s hot fix
releases (“v18.4.1.*”).
* The ‘list_resources’ function has been updated to use the
“gem5.client” APIs to get resource information from the clients
(MongoDB or a JSON file). This has been designed to remain backwards
compatible to as much as is possible, though, due to schema changes,
the function does search across all versions of gem5.
* `get_resources` function was added to the `AbstractClient`. This is a
more general function than `get_resource_by_id`. It was
primarily created to handle the `list_resources` update but is a
useful update to the API. The `get_resource_by_id` function has been
altered to function as a wrapped to the `get_resources` function.
* Removed “GEM5_RESOURCE_JSON” code has been removed. This is no longer
used.
* Tests have been cleaned up a little bit to be easier to read.
* Some docstrings have been updated.
Things that are left TODO with this code:
* The client_wrapper/client/abstract_client abstractions are rather
pointless. In particular the client_wrapper and client classes could
be merged.
* The downloader module no longer does much and should have its
functions merged into other modules.
* With the addition of the `get_resources` function, much of the code in
the `AbstractClient` could be simplified.
Change-Id: I0ce48e88b93a2b9db53d4749861fa0b5f9472053
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71506
Reviewed-by: Kunal Pai <kunpai@ucdavis.edu>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from commit 82587ce71b)
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71739
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
* According to the manual, load reservations must be cleared on a
failed or a successful SC attempt.
* A load reservation can be arbitrarily large. The current
implementation was reserving something different than cacheBlockSize
which could lead to problems if snoop addresses are cache block
aligned. This patch implementation assumes a cacheBlock granularity.
* Load reservations should also be cleared on faults
Change-Id: I64513534710b5f269260fcb204f717801913e2f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71558
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Roger Chang <rogerycchang@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The cache is modeled after an AMD EPYC cache, but not exactly
like AMD EPYC cache.
- K cores per core complex (CCD), each core has one private split L1,
and one private L2.
- K cores in the same CCD share 1 slice of L3 cache, which is not
a victim cache.
- There can be multiple CCDs, which communicate with each other via
Cross-CCD router. The Cross-CCD rounter is also connected to
directory controllers and dma controllers.
- All links latency are set to 1.
Change-Id: Ib64248bed9155b8e48e5158ffdeebf1f2d770754
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71598
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Many of the outstanding issues with the GPU model are related to
instructions not having SDWA/DPP implementations and executing by
ignoring the special registers leading to incorrect executiong.
Adding SDWA/DPP is current very cumbersome as there is a lot of
boilerplate code.
This changeset adds helper methods for VOP2 with one instruction
changed as an example. This review is intended to get feedback
before applying this change to all VOP2 instructions that support
SDWA/DPP.
Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch includes several changes to the gem5 tools interface to the
gem5-resources infrastructure. These are:
* The old download and JSON query functions have been removed from the
downloader module. These functions were used for directly downloading
and inspecting the resource JSON file, hosted at
https://resources.gem5.org/resources. This information is now obtained
via `gem5.client`. If a resources JSON file is specified as a client,
it should conform to the new schema:
https//resources.gem5.org/gem5-resources-schema.json. The old schema
(pre-v23.0) is no longer valid. Tests have been updated to reflect
this change. Those which tested these old functions have been removed.
* Unused imports have been removed.
* For the resource query functions, and those tasked with obtaining the
resources, the parameter `gem5_version` has been added. In all cases
it does the same thing:
* It will filter results based on compatibility to the
`gem5_version` specified. If no resources are compatible the
latest version of that resource is chosen (though a warning is
thrown).
* By default it is set to the current gem5 version.
* It is optional. If `None`, this filtering functionality is not
carried out.
* Tests have been updated to fix the version to “develop” so the
they do not break between versions.
* The `gem5_version` parameters will filter using a logic which will
base compatibility on the specificity of the gem5-version specified in
a resource’s data. If a resource has a compatible gem5-version of
“v18.4” it will be compatible with any minor/hotfix version within the
v18.4 release (this can be seen as matching on “v18.4.*.*”.) Likewise,
if a resource has a compatible gem5-version of “v18.4.1” then it’s
only compatible with the v18.4.1 release but any of it’s hot fix
releases (“v18.4.1.*”).
* The ‘list_resources’ function has been updated to use the
“gem5.client” APIs to get resource information from the clients
(MongoDB or a JSON file). This has been designed to remain backwards
compatible to as much as is possible, though, due to schema changes,
the function does search across all versions of gem5.
* `get_resources` function was added to the `AbstractClient`. This is a
more general function than `get_resource_by_id`. It was
primarily created to handle the `list_resources` update but is a
useful update to the API. The `get_resource_by_id` function has been
altered to function as a wrapped to the `get_resources` function.
* Removed “GEM5_RESOURCE_JSON” code has been removed. This is no longer
used.
* Tests have been cleaned up a little bit to be easier to read.
* Some docstrings have been updated.
Things that are left TODO with this code:
* The client_wrapper/client/abstract_client abstractions are rather
pointless. In particular the client_wrapper and client classes could
be merged.
* The downloader module no longer does much and should have its
functions merged into other modules.
* With the addition of the `get_resources` function, much of the code in
the `AbstractClient` could be simplified.
Change-Id: I0ce48e88b93a2b9db53d4749861fa0b5f9472053
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71506
Reviewed-by: Kunal Pai <kunpai@ucdavis.edu>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The uint_fast16_t is the integer at least 16 bits size, it can be
32, 64 bits and more. Usually most of the simulations are in the
x86-64 linux host, the size of uint_fast16_t is 64 bits. Therefore,
there is no problem for double precision float operations and it can
pass FloatMM test. However, in the Mac OS, the size of uint_fast16_t
is 16 bits, it will lose the upper bits when converting float
register bits to freg_t and it will generate unexpected results for
FloatMM test.
The change can guarantee that the size of data in freg_t is at least
64 bits and it will not lose any data from floating point to freg_t.
Reference:
https://developer.apple.com/documentation/kernel/uint_fast16_thttps://codebrowser.dev/glibc/glibc/stdlib/stdint.h.html
Change-Id: I3df6610f0903cdee0f56584d6cbdb51ac26c86c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71519
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
The uint_fast16_t is the integer at least 16 bits size, it can be
32, 64 bits and more. Usually most of the simulations are in the
x86-64 linux host, the size of uint_fast16_t is 64 bits. Therefore,
there is no problem for double precision float operations and it can
pass FloatMM test. However, in the Mac OS, the size of uint_fast16_t
is 16 bits, it will lose the upper bits when converting float
register bits to freg_t and it will generate unexpected results for
FloatMM test.
The change can guarantee that the size of data in freg_t is at least
64 bits and it will not lose any data from floating point to freg_t.
Reference:
https://developer.apple.com/documentation/kernel/uint_fast16_thttps://codebrowser.dev/glibc/glibc/stdlib/stdint.h.html
Change-Id: I3df6610f0903cdee0f56584d6cbdb51ac26c86c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71578
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>