X86 had a private/arch specific request flag called StoreCheck which it
used to signal to the TLB that it should fault on a load if it would
have faulted had it been a store. That way, you can detect whether a
read-modify-write type of operation is going to fail due to a
translation problem during the read, and don't have to worry about not
doing anything architecturally visible until the store had succeeded,
while also making sure not to do the store part if the modify part
could fail.
It seems that Ruby had hijacked that flag and had an architecture
specific check which was looking for a load which was going to be
followed by a store. The x86 flag was never intended to communicate that
beyond the TLB, and this nominally architecture agnostic component
shouldn't be reaching into the ISA specific flags to try to get that
information.
Instead, this change introduces a new Request flag called
READ_MODIFY_WRITE which is used for the same purpose in x86, but in
general means that a load will be followed by a write in the near
future.
With this new globally applicable flag, the ruby Sequencer class no
longer needs to check what the arch is, nor does it need to access ISA
private data in the request flags. Always doing this check should be no
less efficient than before, because checking the arch involved calling
into the system object, while checking the flag only requires masking a
bit on the flags which the compiler probably already has floating around
for other logic in this function.
Change-Id: Ied5b744d31e7aa8bf25e399b6b321f9d2020a92f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48710
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Gabe Black <gabe.black@gmail.com>
This code will never be executed since FULL_SYSTEM is not part of the
build environment (and hasn't been for many years), and on top of that,
this declaration redundantly (and incompletely) tries to set up the
X86PagetableWalker that the ISA already sets up.
Change-Id: I40cffbd7f60c1f741b1a14d9009f80185c9ce28c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49405
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Add support for LDS accesses by allowing Flat instructions to dispatch
into the local memory pipeline if the requested address is in the group
aperture.
This requires implementing LDS accesses in the Flat initMemRead/Write
functions, in a similar fashion to the DS functions of the same name.
Because we now can potentially dispatch to the local memory pipeline,
this change also adds a check to regain any tokens we requested as a
flat instruction.
Change-Id: Id26191f7ee43291a5e5ca5f39af06af981ec23ab
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48343
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently, we are storing coalesced accesses in
an std::unordered_map indexed by a tick index, i.e.
issue tick / coalescing window. If there are
multiple coalesced requests, at different tick
indexes, to the same virtual address, then the
TLB coalescer will issue just the first one.
However, std::unordered_map is not a sorted
container and we issue coalesced requests by iterating
through such container. This means that the coalesced
request sent in TLBCoalescer::processProbeTLBEvent is
not necessarly the oldest one. Because of this, in
cases of high contention the oldest coalesced request
will have a huge TLB access latency.
To fix this issue, we will use an std::map which is
a sorted container and therefore guarantees the
oldest coalesced request will be sent first.
Change-Id: I9c7ab32c038d5e60f6b55236266a27b0cae8bfb0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48340
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The proxies are not used on the critical path, and it's usually implicit
whether they should be the FS or SE version.
Ideally in the future we won't need to worry about which version we need
to use, but the differences haven't quite been abstracted away, and
occasionally we need to decide between the two.
Change-Id: Idb363d6ddc681f7c1ad5e7aba69865f40aa30dc8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45907
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This patch adds support for a gfx902 Vega APU, ripping the
appropriate values for device_id from the ROCm Thunk
(src/topology.c).
Note: gfx902 isn't officially supported by ROCm. This
means that it may not work for all programs. In particular,
rocBLAS is incompatible with gfx902, so anything that uses
rocBLAS won't be able to run with gfx902.
Change-Id: I48893e7cc9c7e52275fdfd22314f371a9db8e90a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47530
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Remove the duplicate dmaVirt calls from HSA packet processor and GPU
command processor and move them into their own class. This removes some
duplicate code and allows a DmaVirtDevice to be created which will be
useful for upcoming full system GPU commits.
The DmaVirtDevice is an abstraction of the base DmaDevice but iterates
using ChunkGenerator over virtual addresses. Classes which inherit from
DmaVirtDevice must provide a translation function to translate from
virtual address to physical address. Once translated, the physical
address is passed to DmaDevice to do the work.
Change-Id: Idd59ccb4d9ba21c0b1150ee328ededf5a88d824e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47179
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The apertures for non-gfx801 GPUs are set differently.
If the apertures aren't set properly, ROCm will error out.
This change sets the apertures appropriately based on the
gfx version of the simulated GPU. It also adds in new
functions to set the scratch and lds apertures in GFX9 to mimic
the linux kernel.
Change-Id: I1fa6f60bc20c7b6eb3896057841d96846460a9f8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47529
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When the Vega ISA got committed, it lacked the request counter
tracking for memory requests that existed in the GCN3 code.
Instead of copying over the same lines from the GCN3 code to the Vega
code, this commit makes the various memory pipelines handle updating the
request counter information instead, as every memory instruction calls a
memory pipeline.
This commit also adds an issueRequest in scalar_memory_pipeline, as
previously, the gpuDynInsts were explicitly placed in the queue of
issuedRequests.
Change-Id: I5140d3b2f12be582f2ae9ff7c433167aeec5b68e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45347
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
vector_register_file uses the exec_mask of a memory instruction in
order to determine if it should mark a register as in-use or not.
Previously, the exec_mask of memory instructions was only set on
execution of that instruction, which occurs after the code in
vector_register_file. This led to the code reading potentially garbage
data, leading to a scenario where a register would be marked used when
it shouldn't be.
This fix sets the exec_mask of memory instructions in schedule_stage,
which works because the only time the wavefront execMask() is updated is
on a instruction executing, and we know the previous instruction will
have executed by the time schedule_stage executes, due to the order the
pipeline is executed in.
This also undoes part of a patch from last year (62ec973) which treated
the symptom of accidental register allocation, without preventing the
registers from being allocated in the first place.
This patch also removes now redundant code that sets the exec_mask in
instructions.cc for memory instructions
Change-Id: Idabd35020000764fb06133ac2458606c1aaf6f04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45346
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Apply the gem5 namespace to the codebase.
Some anonymous namespaces could theoretically be removed,
but since this change's main goal was to keep conflicts
at a minimum, it was decided not to modify much the
general shape of the files.
A few missing comments of the form "// namespace X" that
occurred before the newly added "} // namespace gem5"
have been added for consistency.
std out should not be included in the gem5 namespace, so
they weren't.
ProtoMessage has not been included in the gem5 namespace,
since I'm not familiar with how proto works.
Regarding the SystemC files, although they belong to gem5,
they actually perform integration between gem5 and SystemC;
therefore, it deserved its own separate namespace.
Files that are automatically generated have been included
in the gem5 namespace.
The .isa files currently are limited to a single namespace.
This limitation should be later removed to make it easier
to accomodate a better API.
Regarding the files in util, gem5:: was prepended where
suitable. Notice that this patch was tested as much as
possible given that most of these were already not
previously compiling.
Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
As part of recent decisions regarding namespace
naming conventions, all namespaces will be changed
to snake case.
::Stats became ::statistics.
"statistics" was chosen over "stats" to avoid generating
conflicts with the already existing variables (there are
way too many "stats" in the codebase), which would make
this patch even more disturbing for the users.
Change-Id: If877b12d7dac356f86e3b3d941bf7558a4fd8719
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45421
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
As part of recent decisions regarding namespace
naming conventions, all namespaces will be changed
to snake case.
sim_clock::Int became sim_clock::as_int.
"as_int" was chosen because "int" is a reserved
keyword, and this namespace acts as a selector of
how to read the internal variables.
Another possibility to resolve this would be to
remove the namespaces "Float" and "Int" and use
unions instead.
Change-Id: I65f47608d2212424bed1731c7f53d242d5a7d89a
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45436
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Maintainer: Gabe Black <gabe.black@gmail.com>
GPU MTYPE is currently set using a global config passed to the
PACoalescer. This patch enables MTYPE to be set by the shader on a
per-request bases. In real hardware, the MTYPE is extracted from a
GPUVM PTE during address translation. However, our current simulator
only models x86 page tables which do not have the appropriate bits for
GPU MTYPES. Rather than hacking non-x86 bits into our x86 page table
models, this patch instead keeps an interval tree of all pages that
request custom MTYPES in the driver itself. This is currently
only used to map host pages to the GPU as uncacheable, but is easily
extensible to other MTYPES.
Change-Id: I7daab0ffae42084b9131a67c85cd0aa4bbbfc8d6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42216
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
New topology ripped from Fiji to support dGPU. A dGPU flag is added to
the config which is propogated to the driver. The emulated driver is
now able to properly deal with dGPU ioctls and mmaps. For now, dGPU
physical memory is allocated from the host, but this is easy to change
once we get a GPU memory controller up and running.
Change-Id: I594418482b12ec8fb2e4018d8d0371d56f4f51c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42214
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>