Add logic to collect pointers to all GPU TLBs in full system. Implement
the invalid TLBs PM4 packet. The invalidate is done functionally since
there is really no benefit to simulate it with timing and there is no
support in the TLB to do so. This allow application with much larger
data sets which may reuse device memory pages to work in gem5 without
possibly crashing due to a stale translation being leftover in the TLB.
Change-Id: Ia30cce02154d482d8f75b2280409abb8f8375c24
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58470
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This makes what are configuration and what are internal SCons variables
explicit and separate, and makes it unnecessary to call out what
variables to export to C++.
These variables will also be plumbed into and out of kconfiglib in later
changes.
Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently if a Ruby functional access fails to find an address in the
caches, it gives up. For functional page table walks we need to be able
to go all the way to memory. This adds a pointer to the system object
which allows the walker to get a pointer to device memory which can be
used to do a functional access directly to memory bypassing Ruby.
Change-Id: I0ead6e5e130a0d53021c44ae9221b167c6316ab2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57529
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Create a VM class to reduce clutter in the amdgpu_device.* files. This
new file is in charge of reading/writting MMIOs related to VM contexts
and apertures. It also provides ranges checks for various apertures and
breaks out the MMIO interface so that there are not overloaded macro
definitions in the device MMIO methods.
The new translation generator classes for the various apertures are also
added to this class.
Change-Id: Ic224c1aa485685685b1136a46eed50bcf99d2350
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53066
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The executed_as field is currently not set for global memory
instructions. This results in the default of SC_NONE, causing the status
vector to be all zeros. The GM pipe sees this and completes the
instruction immediately rather than issuing memory requests. This is
fixed by marking the instruction as executed as SC_GLOBAL always. Flat
instructions use resolvedFlatSegment for this, however since global
instructions are known to be global we can set this field directly. This
results in the expected issuing of memory requests to GPU memory.
Change-Id: Ic23102853ccd49a41e2f083b7bb24f033dfed18a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57829
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add the page table walker, page table format, TLB, TLB coalescer, and
associated support in the AMDGPUDevice. This page table format used the
hardware format for dGPU and is very different from APU/GCN3 which use
the X86 page table format.
In order to support either format for the GPU model, a common
TranslationState called GpuTranslation state is created which holds the
combined fields of both the APU and Vega translation state. Similarly
the TlbEntry is cast at runtime by the corresponding arch files as they
are the only files which touch the internals of the TlbEntry. The GPU
model only checks if a TlbEntry is non-null and thus does not need to
cast to peek inside the data structure.
Change-Id: I4484c66239b48df5224d61caa6e968e56eea38a5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51848
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The offset field in Flat-style instructions is treated differently
based on if the instruction is Flat or Global/Scratch.
In Flat insts, the offset is treated as a 12-bit unsigned number.
In Global/Scratch insts, the offset is treated as a 13-bit signed number.
This patch updates the calcAddr function for Flat-style instructions
to properly sign-extend the offset on Global/Scratch instructions
Change-Id: I57f10258c23d900da9bf6ded6717c6e8abd177b7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57209
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
The stack size is something that applies to addresses when performing
accesses as part of some instructions. This was handled inconsistently
or incompletely or simply incorrectly in a few ways.
First, when pushing or popping from the stack, the *address size* should
be set to the stack size. The data size is generally the operand size.
When the stack pointer is incremented/decremented, it should be changed
by the data size. When a stack pointer is manipulated, the data size
for those calculations should be the stack size. Importantly that does
not change the value of the increment/decrement, which is the operand
size still. This usage has been fixed throughout.
The TLB generally needs to know what the address size was in order to
figure out what segment offset was used so that it can do limit checks.
There is some inherent inaccuracy in doing things in reverse like this,
but that's how it works currently. To find that size, the TLB tried to
start from first principles to figure out what the default address size
was, and then whether there was an override was passed in through the
request flags.
This is *very* inaccurate for a few reasons. First, the override doesn't
always apply. Second, the address size used by a particular instruction
doesn't have to be based on any particular size, whether that is the
default or alternate address size, the stack size, etc.
Instead, the instructions now pass the actual size being used in as a 2
bit value (0 -> 1 byte, 1 -> 2 bytes, 2 -> 4 bytes, 3 -> 8 bytes),
avoiding most of the inaccuracy and approximation.
Because the CPU won't embed any size information into fetches, we'll
just assume those have no wrap around within the address size.
Finally, there were microops that had been added which overrode the
address size to be the stack size internally, and try to help the TLB
figure out what to do to figure out the address size. Because both of
those things are now handled in a different way, those microops are no
longer needed or used and have been deleted.
Change-Id: I2b1bdf1acf1540bf643fac6d49fe1a5a576ba5c1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55443
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Global instruction address calculation when using an SGPR or SGPR pair
as a base address was being calculated incorrectly when 64-bit addresses
were to be generated.
From the ISA documentation, the SGPR should be read as 32-bit or 64-bit
depending on "ADDRESS_MODE." The VGPR-offset (computed from the lower
32-bits of vaddr) should always be 32-bits and the offset is 12 bits
from the instruction. This means the 32-bit mask should only be applied
to vaddr to get the VGPU-offset rather than the final sum.
The SGPR base format is being seen in more recent clang/ROCm versions to
avoid unnecessary copies of SGPRs into VGPRs to use VGPRs as the base
address.
Change-Id: I48910611fcfac5b62bc63496bbaabd6f6e53fe0d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55643
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Ported from https://gem5-review.googlesource.com/c/public/gem5/+/48019:
Certain DS insts are classfied as Loads, but don't actually go through
the memory pipeline. However, any instruction classified as a load
marks its destination registers as free in the memory pipeline.
Because these instructions didn't use the memory pipeline, they
never freed their destination registers, which led to a deadlock.
This patch explicitly calls the function used to free the destination
registers in the execute() method of those Load instructions that
don't use the memory pipeline.
Change-Id: I8231217a79661ca6acc837b2ab4931b946049a1a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55463
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Port large DS read/write instructions from
https://gem5-review.googlesource.com/c/public/gem5/+/48342.
This implements the 96 and 128b ds_read/write instructions in a similar
fashion to the 3 and 4 dword flat_load/store instructions.
These instructions are treated as reads/writes of 3 or 4 dwords, instead
of as a single 96b/128b memory transaction, due to the limitations of
the VecOperand class used in the amdgpu code.
In order to handle treating the memory transaction as multiple dwords,
the patch also adds in new initMemRead/initMemWrite functions for ds
instructions. These are similar to the functions used in flat
instructions for the same purpose.
Change-Id: Iee2de14eb7f32b6654799d53dc97d806288af98f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55344
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Port the fixes for scalar source checks from arch-gcn3 at
https://gem5-review.googlesource.com/c/public/gem5/+/48344.
Scalar sources can either be a general-purpose register or a constant
register that holds a single value.
If we don't check for if the register is a general-purpose register,
it's possible that we get a constant register, which then causes all of
the register mapping code to break, as the constant registers aren't
supposed to be mapped like the general-purpose registers are.
This fix adds an isScalarReg check to the instruction encodings that
were missing it.
Change-Id: I30dd2d082a5a1dcc3075843bcefd325113ed1df6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55343
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Remove the line "For use for simulation and test purposes only" in files
were AMD is the only copyright holder listed in the header. This happens
to be the case for all files where this line exists, removing it
completely from gem5.
Change-Id: I623f266b002f564301b28774f49081099cfc60fd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Move GpuTLB and TLBCoalescer to GCN3 as the TLB format is specific to
GCN3 and SE mode / APU simulation. Vega will have its own TLB,
coalescer, and walker suitable for a dGPU. This also adds a using alias
for the TLB translation state to reduce the number of references to
TheISA and X86ISA. X86 specific includes are also removed.
Change-Id: I34448bb4e5ddb9980b34a55bc717bbcea0e03db5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49847
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Global instructions are new in Vega and are essentially FLAT
instructions from GCN3 but guaranteed to go to global memory where as
flat can go to global or local memory.
This reworks the flat instruction classes so that the initiateAcc /
execute / completeAcc logic can be reused for flat, global, and later
scratch subtypes of flat instructions. The decoder creates a flat
instruction class which sets instruction flags based on the flat
instruction's SEG field. There are new initOperandInfo and
generateDissasmbly methods for flat and global. The number of operands
and operand index getters are modified to check the flags and return the
correct value for the subtype.
Change-Id: I1db4a3742aeec62424189e54c38c59d6b1a8d3c1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47106
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Similar to the flags issue in the previous patch, the FlatGlobal flag
does not exist. Change all of the flat instructions to use the same
issue logic as GCN3. A helper function is also added as loads and stores
use the same interface. The helper function can be more easily updated
to support global and scratch subtypes of flat instructions.
Change-Id: I394f1d4c59b029201fe2f6075c9dedb3a37dbe31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50827
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The instructions file seems to be assuming a newer pipeline which is not
released. The flags are therefore not set in Vega as the newer pipeline
infers them. This adds back flags for MemoryRef instructions, fixes
waitcnt and removes CondBranch which was not checked and changed to
Branch.
This also removeds unused Cac flags and fixes the casing for ReadsEXEC
and WritesEXEC. The remaining flags are not used at all by the pipeline
and are removed to avoid confusion as to whether these are needed for
GCN3 or not.
Change-Id: I976cbd407a466e8ad77c84dbdc29082f49e28f3b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47102
Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This function used makeAtomicOpFunctor to create a unique_ptr which
pointed to an AtomicOpFunctor *, which it immediately extracted with
.get(). Then since the temporary unique_ptr went out of scope, it
deleted the AtomicOpFunctor which it just returned a pointer to.
Instead, that function should create a local unique_ptr to pass
ownership of the object off to. It will still be cleaned up when it
goes out of scope, but not before it's done being used.
Change-Id: I74a0bcbb719a78a3e9ec8cb2ea5aa15120da0456
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49023
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Now that we're using c++17, the type_traits with a ::value member have
a _v alias which reduces verbosity. Or on other words
std::is_integral<T>::value
can be replaced with
std::is_integral_v<T>
Make this substitution throughout the code base. In places where gem5
introduced it's own similar templates, add a V alias, spelled
differently to match gem5's internal style.
gem5: :IsVarArgs<T>::value => gem5::IsVarArgsV<T>
Change-Id: I1d84ffc4a236ad699471569e7916ec17fe5f109a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48604
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Scalar sources can either be a general-purpose register or a constant
register that holds a single value.
If we don't check for if the register is a general-purpose register,
it's possible that we get a constant register, which then causes all of
the register mapping code to break, as the constant registers aren't
supposed to be mapped like the general-purpose registers are.
This fix adds an isScalarReg check to the instruction encodings that
were missing it.
Change-Id: I3d7d5393aa324737301c3269cc227b60e8a159e4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48344
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Add support for LDS accesses by allowing Flat instructions to dispatch
into the local memory pipeline if the requested address is in the group
aperture.
This requires implementing LDS accesses in the Flat initMemRead/Write
functions, in a similar fashion to the DS functions of the same name.
Because we now can potentially dispatch to the local memory pipeline,
this change also adds a check to regain any tokens we requested as a
flat instruction.
Change-Id: Id26191f7ee43291a5e5ca5f39af06af981ec23ab
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48343
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This implements the 96 and 128b ds_read/write instructions in a similar
fashion to the 3 and 4 dword flat_load/store instructions.
These instructions are treated as reads/writes of 3 or 4 dwords, instead
of as a single 96b/128b memory transaction, due to the limitations of
the VecOperand class used in the amdgpu code.
In order to handle treating the memory transaction as multiple dwords,
the patch also adds in new initMemRead/initMemWrite functions for ds
instructions. These are similar to the functions used in flat
instructions for the same purpose.
Change-Id: I0f2ba3cb7cf040abb876e6eae55a6d38149ee960
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48342
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Alex Dutu <alexandru.dutu@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Certain DS insts are classfied as Loads, but don't actually go through
the memory pipeline. However, any instruction classified as a load
marks its destination registers as free in the memory pipeline.
Because these instructions didn't use the memory pipeline, they
never freed their destination registers, which led to a deadlock.
This patch explicitly calls the function used to free the destination
registers in the execute() method of those Load instructions that
don't use the memory pipeline.
Change-Id: Ic2ac2e232c8fbad63d0c62c1862f2bdaeaba4edf
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48019
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Certain instructions don't have implementations in instructions.cc,
and get decoded as a nullptr.
This adds a fatal when decoding a missing instruction, as we aren't
able to properly run a program if all its instructions aren't
implemented, and it allows us to figure out which instruction is
missing due to fatals printing the line they were called.
Change-Id: I7e3690f079b790dceee102063773d5fbbc8619f1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47522
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>