The executed_as field is currently not set for global memory
instructions. This results in the default of SC_NONE, causing the status
vector to be all zeros. The GM pipe sees this and completes the
instruction immediately rather than issuing memory requests. This is
fixed by marking the instruction as executed as SC_GLOBAL always. Flat
instructions use resolvedFlatSegment for this, however since global
instructions are known to be global we can set this field directly. This
results in the expected issuing of memory requests to GPU memory.
Change-Id: Ic23102853ccd49a41e2f083b7bb24f033dfed18a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57829
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The first big change is gcc-6.4 is no longer supported in FastModel
11.17. We switch to gcc-7.3. Next, TARGET_MAXVIEW is
replaced by TARGET_SYSTEMC_MAXVIEW. The default value of
TARGET_SYSTEMC_MAXVIEW is zero. So we can simply remove TARGET_MAXVIEW.
Finally, I fixed an undefined exception in the build script.
Change-Id: I5ec70112056513c253e6127ed5f8abacf191431f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57549
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In FS mode offsets for HSA queues are determined by the driver and
cannot be linearly assigned as they are in SE mode. Add plumbing to pass
the offset of a queue to the HSA packet processor and then to HW
scheduler.
A mapping to/from queue ID <-> doorbell offset are also needed to be
able to unmap queues. ROCm 4.2 is fairly aggressive about context
switching queues, which results in queues being constantly mapped and
unmapped.
Another result of remapping queues is the read index is not preserved in
gem5. The PM4 packet processor will write the currenty read index value
to the MQD before the queue is unmapped. The MQD is written back to
memory on unmap and re-read on mapping to obtain the previous value.
Some helper functions are added to be able to restore the read index
from a non-zero value.
Change-Id: I0153ff53765daccc1de23ea3f4b69fd2fa5a275f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53076
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add a new option `auto_unlink_shared_backstore` to System so it will
remove the shared backstore used in physical memories when the System is
getting destructed. This will prevent unintended memory leak.
If the shared memory is designed to live through multiple round of
simulations, you may set the option to false to prevent the removal.
Test: Run a simulation with shared_backstore set, and see whether there
is anything left in /dev/shm/ after simulation ends.
Change-Id: I0267b643bd24e62cb7571674fe98f831c13a586d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57469
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
Add the page table walker, page table format, TLB, TLB coalescer, and
associated support in the AMDGPUDevice. This page table format used the
hardware format for dGPU and is very different from APU/GCN3 which use
the X86 page table format.
In order to support either format for the GPU model, a common
TranslationState called GpuTranslation state is created which holds the
combined fields of both the APU and Vega translation state. Similarly
the TlbEntry is cast at runtime by the corresponding arch files as they
are the only files which touch the internals of the TlbEntry. The GPU
model only checks if a TlbEntry is non-null and thus does not need to
cast to peek inside the data structure.
Change-Id: I4484c66239b48df5224d61caa6e968e56eea38a5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51848
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
dGPUs can translate a virtual address and will not know if the address
resides in system/host memory or device/dGPU memory until the
translation is complete. In order to mark requests as going to either
system memory or device memory we add a field to the Request class.
Change-Id: Ib1e80e8d03ecdfeb11c24d979ccc4b912ce07f91
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51852
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
The ESR.IL field (Instruction Lenght) is set to 0 if the exception
has been triggered by a 16-bit instruction (Thumb) and 1 otherwise.
Current implementation has been implemented more or less correctly
for AArch32 but not for AArch64; by doing:
if (to64) {
esr.il = 1;
} ... [AArch32]
We are directly setting ESR.IL to 1 in case the exception is taken in
AArch64 mode. This is not covering the case of a thumb instruction
faulting to AArch64.
We are fixing this by defining a virtual method returning the ESR.IL
bitfield depending on the exception cause/type. This is following
the Arm Architectural Reference Manual, which states ESR.IL bit should
be set to 1 for 32-bit instructions and for cases where the fault
doesn't really depend on the instruction:
* SError interrupt
* Instruction Abort exception
* PC alignment exception
* SP alignment exception
* Data Abort exception for which the value of the ISV bit is 0.
* Illegal Execution state exception.
* Debug exception except for Breakpoint instruction exceptions
* Exception reported using EC value 0b000000.
Change-Id: I79c9ba8397248c526490e2ed83088fe968029b0e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57570
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The correct accessor is well known by the code providing a template for
buildReadCode/buildWriteCode, and so can be simply inserted without the
indirection. This makes the code a little easier to read, and those
templating functions simpler and easier to understand.
Change-Id: I403c6e4c291708f8b58cce08bfa32ee2a930c296
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49737
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Some operand types had read/write code overrides, I think largely by
pattern matching other operand types, and not because that code was
actually expected to be used or to work. Instead, we should just assert
that that code isn't used and remove the implementation. This method of
affecting reading and writing code is going away anyway, and if this is
needed in the future it can be replaced in the new system.
Change-Id: Idae886153aa343570109069cbe54e2c1699a34e5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49736
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Some instructions behave in special ways in virtual 8086 mode. In some
cases, that means that they behave like they do in real mode, even
though the CPU has protected mode enabled. In other cases, it means that
there are extra checks, or even very different behaviors, which help
virtualize the system for the 8086 programs.
Change-Id: I70723b38ea0a7625c4a557bf4dd8f044e5715172
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55809
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When powered on, the "passed self test" bit should not be set. It should
only be set once the I8042 has actually been told to do a self test.
Also the mouse and keyboard should be disabled. With them disabled their
interrupts won't matter, but we might as well leave those disabled as
well.
Change-Id: Ief1ab30365a0a8ea0a116e52c16dcccf441515ec
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55805
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When the clone syscall is called, a new process is created which
allocates a new page table. If clone was called with CLONE_THREAD, the
page table of that new process is then marked as shared. Next, initState
is called on the process which calls the page table's initState. For the
multi level page table, initState only sets the base pointer if shared
is false. This means that in this order the base pointer of the new page
table is not currently initialized causing spurious errors.
To fix this, the page table is explicitly initialized after the new
process and new page table are created but before the page table is
marked as shared. The process initState continues as normal and the new
page table's base pointer is not modified by further calls to initState
as it is already marked shared.
Change-Id: I4a533e13565fa572fb9153a926f70958bc7488b7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56366
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The page table walker might need to write back page table entries to set
their accessed bits. It was already checking whether the access was 32
or 64 bit when the PTE was retrieved from the incoming packet, but was
not checking the size when it was written back out, causing an assert to
fail when working with 32 bit legacy PTEs.
Change-Id: I7d02241cad20681e6cac0111edf2454335c466fa
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55808
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The default MTYPE initialization in the emulated GPU driver is currently
doing a bitwise AND on an input integer param with other integers
instead of using a bitmask. Change this to use bitset and test the bit
positions corresponding to the values in the MTYPE enum that were
previously being used as an operand for bitwise AND.
This was causing invalid slicc transitions in some benchmarks for
combinations of request type and mtype that are undefined.
Change-Id: I93fee0eae1fff7141cd14c239c16d1d69925d08d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56367
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This enhances MOESI_AMD_Base-dir DmaWrite to enable partial writes. This
is currently done by assuming a full cache line, invalidating caches,
and transitioning back to unblocked state. The enhanced write supports
partial writes (i.e., smaller than cache line size) by first reading
memory, merging the modified data, and then writing back to memory.
Implementation of this mirrors that of DmaRead in terms of state. This
means for each DmaRead state (BDR_PM, BDR_Pm, and BDR_M) there is a
write analogue (BDW_PM, BDW_Pm, and BDR_M) and the BDR_P state is
removed. Furthermore, this enhanced DmaWrite ... actually writes data to
memory instead of relying on DirectoryEntry / backing store for correct
data.
There are two possible state transitions for DmaWrite now. (1) Memory
data arrives before probe response and (2) probe response arrives before
memory data. In case (1), probe data overwrites memory data and merges
the partial write using the TBE write mask then updates write mask to
'filled' state. In case (2), probe data is merged with the partial data
using the TBE write mask then updates write mask to 'filled' state. The
memory data will then be clobbered by copying the TBE data over the
response since the write mask is now full.
Change-Id: I1eebb882b464c4c5ee5fd60932fd38d271ada4d7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57410
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This protocol is using an old style where read/writes to memory were
being done by writing to a DataBlock in a DirectoryMemory entry. This
results in having multiple copies of memory, leads to stale copies in at
least one memory (usually DRAM), and require --access-backing-store in
most cases to work properly. This changeset removes all references to
getDirectoryEntry(...).DataBlk and instead forwards those reads and
writes to DRAM always.
Change-Id: If2e52151789ad82c7b55c8fa2b41c1f4e5b65994
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57409
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Now that the I8259's vector is reported using a special memory access,
the getVector method doesn't need to be accessible outside of the class.
It's still useful internally though, since it nicely encapsulates what
should happen when an INTA signal is received.
Change-Id: I7da7c1f18fac97ffc62c965978f53fb4c5430de3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55698
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This pin should be connected to the master I8259 output which is used to
bypass the IOAPIC when it is disabled and the local APIC is in virtual
wire mode. This is how the system is supposed to start, and can later be
switched into symmetric multiprocessing mode later on by an SMP aware OS
(most of them). Only the BSP should have it's LINT0 pin connected to the
I8259, since I8259 type interrupts are only usable by a single CPU at a
time.
Change-Id: I0e3e3338f14d384c26da660cf54779579eb0d641
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55696
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>