SDMA packets which use dmaVirtWrites call their completion event before
the write takes place in the Ruby protocol. This causes a use-after-free
issue corruption random memory locations leading to random errors. This
commit adds a cleanup event for each packet that uses DMA and sets the
cleanup latency as 10000 ticks. In atomic mode, the writes complete
exactly 2000 ticks after the completion event is called and therefore a
fixed latency can be used. This is not tested with timing mode, which
does not work with GPUFS at the moment, so a warning is added to give an
idea where to look in case the same issue occurs once timing mode is
supported.
Change-Id: I9ee2689f2becc46bb7794b18b31205f1606109d8
The SDMA engine copies data in chunks. It currently uses the pointer
returned from new[] and manipulates it using pointer arithmetic. This
modified pointer is then passed to the completion function which deletes
the pointer. Since it is not the original pointer allocated by new[]
this triggers issues in ASAN.
Change-Id: I03ccf026633285e75005509445c62fcbda8eb978
Clang warns as follows: `warning: definition of implicit copy
constructor for 'TranslResult' is deprecated because it has a
user-declared copy assignment operator`
Change-Id: Ic701d8522aac75d569f4f513f54de91f76a17e48
`src/dev/virtio/VirtIORng 2.py` is identical to
`src/dev/virtio/VirtIORng.py`, and the former does not appear in any
build script.
Change-Id: I9c5f1b1a3809d1c7028b630c32310e540613e232
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
SDMA RLC queues do not currently remove their doorbell mapping. This can
cause issues re-registering the queue and prevents the pending doorbells
feature from working. In addition the data value of the doorbell (the
ring buffer rptr) is not saved, leading to UB when this workaround is
used.
This commit removes the doorbell mapping from the gpu device when the
SDMA engine unmaps an RLC queue and copies the next doorbell value to
the pending packet as was originally intended.
Change-Id: Ifd551450f439c065579afcf916f8ff192e7598ab
This change fixes#1148
I have only added an acknowledged return, as we dont ahve remote and
wrap mode so it can only be in stream mode.
Change-Id: I1882042d873ff0e9465c9491238554c8fbb9aa76
This is the version for MI300. For the most part, it is the same as
MI200 with the exception of architected flat scratch (not yet
implemented in gem5) and therefore a new version enum is required.
Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a
SDMA ptePde packets are generating a warning that a GART address is
missing, causing a wrong address to be clobbered by the operation.
This commit fixes this by converting the GART address when the queue is
running in privledged mode, which is the only mode allowed to use GART
addresses. This removes the warnings and writes to the correct memory
region.
Change-Id: I64acac308db2431c5996b876bf4cda704f51cf25
Fixes several memory leaks, mostly of small and medium severity. Fixes
mismatched new/new[] and delete/delete[] calls.
Change-Id: Iedafc409389bd94e45f330bc587d6d72d1971219
Hi, we've noticed some issues with the Uart8250 device when using it as
the Linux console. Sometimes the Uart interrupt would remain constantly
posted, so Linux would continue to try and handle it, effectively
resulting in an infinite loop. With this patch, I'm no longer seeing any
issues, but my testing has been limited to configurations and workloads
we're interested in at Imagination, so please let me know if there's
some other tests I should run or if you notice any other issues.
This patch fixes several issues with interrupt posting and clearing in
the uart8250 device.
The "status" member variable and the console interrupt should be kept in
sync. However, in one code path in readIir, the interrupt bit was being
cleared in the status variable but not in the platform controller.
Additionally, in some code paths, the interrupts would be cleared in the
status variable and in the interrupt controller, but a future interrupt
would remain scheduled, causing a spurious interrupt and setting a bit
in status to 1.
These issues can confuse the kernel and result in an ininite interrupt
handling loop.
Another issue is related to the fact that there are two interrupt causes
(TX and RX) and both of them can be valid at the same time. When one of
them becomes no longer valid, we should check the status of the other
one before clearing the interrupt.
This patch addresses the issues listed above and refactors the interrupt
clearing logic to reduce repetition.
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf
This PR is offloading some of the partitioning logic to the partitioning
manager, effectively changing
the partitioning interface. Rather than always relying on the
PartitionFieldExtention data structure to
convey partition IDs, we make it implementation defined by introducing
the partitioning manager abstraction.
We want user to be able to extract the partitionId more flexibly and
this requires using a SimObject.
Users can extend the PartitioningManager, overriding the
readPacketPartitionId, therefore providing their
own mean of injecting/extracting partitioning data from a packet
The new ISA-agnostic interface is the PartitionManager.
We therefore make the PartitionFieldExtention private to the
Arm implementation of memory partitioning (FEAT_MPAM)
Any other partitioning implementation should override the
PartitionManager::readPacketPartitionID to provide a mean
for extracting partitioning data (partition_id) from the
incoming Packet.
With this commit we also define an MPAM MSC which is
supposed to be the partitioning manager for the
Memory System Component
Change-Id: I6959ace0c0cbca549dcc1aacd53dff223b5fe328
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Remove the following files:
* src/dev/virtio/rng 2.cc
* src/dev/virtio/rng 2.hh
Which were a copy of rng.hh and rng.cc. Probably added to the repository
by accident. They were not compiled by scons
Change-Id: I9d1da19cc243c513ab7af887b1b6260d8e361b57
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Currently gem5 assumes that there is only one command processor (CP)
which contains the PM4 packet processor. Some GPU devices have multiple
CPs which the driver tests individually during POST if they are used or
not. Therefore, these additional CPs need to be supported.
This commit allows for multiple PM4 packet processors which represent
multiple CPs. Each of these processors will have its own independent
MMIO address range. To more easily support ranges, the MMIO addresses
now use AddrRange to index a PM4 packet processor instead of the
hard-coded constexpr MMIO start and size pairs.
By default only one PM4 packet processor is created, meaning the
functionality of the simulation is unchanged for devices currently
supported in gem5.
Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c
PCIe can read/write to any 32-bit address using the PCI index/index2
registers as an address and then reading/writing the corresponding
data/data2 register.
This commit adds this functionality and removes one magic value being
written to support GPU POST. This feature is disabled for Vega10 which
relies on an MMIO trace for too many values to implement in the MMIO
interface.
Change-Id: Iacfdd1294a7652fc3e60304b57df536d318c847b
The SRBM write packets where previously not required. This commit
implements SRBM writes to set a register by using the new setRegVal
interface. SRBM writes seem to be used for SRIOV enabled devices.
Change-Id: I202653d339e882e8de59d69a995f65332b2dfb8c
The top level AMDGPUDevice currently reads/writes all unknown registers
to/from a map containing the previously written value. This is intended
as a way to handle registers that are not part of the model but the
driver requires for functionality. Since this is at the top level, it
can mask changes to register values which do not go through the same
interface. For example, reading an MMIO, changing via PM4 queue, and
reading again returns the stale cached value.
This commit removes the usage of the regs map in AMDGPUDevice,
implements some important MMIOs that were previously handled by it, and
moves the unknown register handling to the NBIO aperture only. To reduce
the number of additional MMIOs to implement, the display manager in
vega10 is now disabled.
Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2
The SDMA engine can potentially be used to write to the GART address
range. Since gem5 has a shadow copy of the GART table to avoid sending
functional reads to device memory, the GART table must be updated when
copying to the GART range.
This changeset adds a check in the VM for GART range and implements the
SDMA copy packet writing to the GART range. A fatal is added to write
and ptePde, which are the only other two ways to write to memory, as
using these packets to update the GART table has not been observed.
Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd
The write data packet can write multiple dwords but currently always
assumes there is one dword, which can cause some write data to be
missed. This case is not common, but the number of dwords is implicitly
defined in the PM4 header.
This changeset passes the PM4 header to write data so that the correct
number of dwords can be determined. For now we assume no page crossing
when writing multiple dwords as the driver should be checking for that.
Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7
The ROCm 6.0 driver adds a node_id field to interrupts which must match
before passing on the interrupt to be cleared by the cookie from gem5's
interrupt handler implementation. Add this field and enable for gfx942.
The usage of the field can be seen in event_interrupt_isr_v9_4_3 at
https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/
gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449
Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922
The SMMU_IRQ_CTRL had been made optionally writeable by a
prior patch [1] even if interrupts were not supported in
the SMMUv3 model.
As we are partially enabling IRQ support, we remove this option
and we make the SMMU_IRQ_CTRL always writeable
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/38555
Change-Id: Ie1f9458d583a5d8bcbe450c3e88bda6b3c53cf10
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
See https://github.com/orgs/gem5/discussions/898
The SMMUv3 Event Queue is basically unused at the moment. Whenever a
transaction fails we actually abort simulation. The sendEvent method
could be used to actually report the failure to the driver but it is
lacking interrupt support to notify the PE there is an event to handle.
The SMMUv3 spec allows both wired and MSI interrupts to be used.
We add the eventq_irq SPI param to the SMMU object and we draft an
initial sendInterrupt utility that makes use of it whenever it is
needed.
Change-Id: I6d103919ca8bf53794ae4bc922cbdc7156adf37a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Rely on the architected solution instead of aborting simulation.
This means handling writes to the Event queue to signal managing
software there was a fault in the SMMU
Change-Id: I7b69ca77021732c6059bd6b837ae722da71350ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The struct fields of the SMMUEvent were not matching the SMMUv3 specs.
This was "not an issue" as events have been implicitly disabled until
now (every translation error was aborting simulation)
With generateEvent we automatically construct a SMMU event from
a translation result.
Change-Id: Iba6a08d551c0a99bb58c4118992f1d2b683f62cf
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
A faulting translation should return additional information
(other than the fault type). This will be used by future
patches to properly populate the SMMU event record of the
event queue
As we currenlty support two faults only:
1) F_TRANSLATION
2) F_PERMISSION
We add to TranslResult the relevant fault information only:
type, class, stage and ipa
Change-Id: I0a81d5fc202e1b6135cecdcd6dfd2239c2f1ba7e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reading the Context Descriptor (CD) might require a stage2
translation. At the moment doReadCD does not check for the
return value of the translateStage2.
This means that any stage2 fault will be silently discarded
and an invalid address will be used/returned.
By returning a translation result we make sure any error
happening in the second stage of translation will be properly
flagged
Change-Id: I2ecd43f7e23080bf8222bc3addfabbd027ee8feb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We don't check the fault type directly. This will improve
readability once the TranslResult class will be augmented
with extra fields
Change-Id: I5acafaabf098d6ee79e1f0c384499cc043a75a9d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
One of the limitations of the RegBank class is that it does not allow
you to pass a non-contiguous set of registers. Its simplest form will
just accept an initializer list of registers and it will store them in
sequence.
A more refined version [1] will optionally accept an offset value to be
passed alongside the register reference. This is not meant to be used by
the register bank to store the register at the provided offset.
It is rather used by the bank to sanity check the register sits exactly
at the provided range.
The way to work around this for a fragemented register space is to
explicitly allocate RAZ/RAO blocks as registers and to pass them to
addRegisters together with the others. (See the SysSecCtrl [2] as an
example)
This makes it a bit tedious to model a register bank with gaps between
its registers. First, the exact number and position of the gaps needs to
be extraced from a spec. These sometimes report only implemented
registers and their offset, and omit to document gaps/reserved space. So
a developer needs to manually add register offset and size to check if
all registers are contiguous. Second, these reserved register blocks
need to be instantiated in the bank adding boilerplate code and
affecting readibility.
For these reasons we add a new registration method, called
addRegisters*At*. It reuses the RegisterAdder class but this time the
offset field is really used to instruct the bank where the register
should be mapped. The method is templated and the template parameter
tells the bank which register type should be used to fill the remaining
space. We make the RegBank the owner of this filler space (registers are
generated internally within addRegistersAt).
[1]: https://github.com/gem5/gem5/blob/stable/src/dev/reg_bank.hh#L106
[2]: https://github.com/gem5/gem5/blob/stable/src/dev/arm/ssc.cc#L48
Change-Id: I614ae6e9eeb40b365ac9b6dd8b75abbfdb9cb687
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Update the PLIC based on the
[riscv-plic-spec](https://github.com/riscv/riscv-plic-spec) in the PR:
- Support customized PLIC hardID and privilege mode configuration
- Backward compatable with the n_contexts parameter, will generate the
config like {0,M}, {0,S}, {1,M} ...
Change-Id: Ibff736827edb7c97921e01fa27f503574a27a562
ArmSigInterruptPin don't send the interrupt to GIC. Instead it sends the
interrupt to the irq specified in Param. When using ArmSigInterruptPin,
we shouldn't ask users to provide "Platform" since it doesn't need it.
To reduce the confusion, this change removes the dependency of Platform
for ArmSigInterruptPin.
Change-Id: I0ee507ed1c08b4fa6d3e384e28732f3acb4f6892
The PCI configuration space is 256 bytes, yet because the
PCI_CONFIG_SIZE macro is 0xff, the final register allocation in the IDE
controller only allocated up to byte 255.
Change-Id: I1aef2cad9df366ee8425edb410037061eb29ae33
If the NOP count of an SDMA NOP packet goes beyond the wptr address, the
queue decode method will loop infinitely. If a packet comes in with a
bad count this causes gem5 to hang. This change advances the rptr one
dword at a time until either reaching the NOP count or when rptr == wptr
to prevent this issue.
Change-Id: Ib2c0f74a477bff27890c9c064bb4190e76e513bd
Related to issue #703 , this PR removes GCN3 related files and updates
source code, documentation, and tests to switch over to Vega is that was
not done already. Highlights are:
- Remove all src/arch/amdgpu/gcn3 files and update Kconfigs.
- Remove references to GCN3 and replace with Vega where applicable.
- Update the build targets in the gcn-gpu Docker. This will need to be
rebuilt but not urgently.
- Remove the GCN3 tag in testlib. Most tests seem to be using Vega
already, so that commit is small.
By default all SDMA queues are privileged queues, meaning the addresses
in SDMA packets use the privileged translation tables. RLC queues
(sometimes called user queues) are not necessarily privileged and might
use user translation tables. RLC queues are used more often in ROCm 6.0
exposing an issue with invalid translations with RLC queues.
This changeset checks the priv bit in the SDMA MQD when an RLC queue is
mapped. Each packet type which uses an address then checks the bit
before performing translation. Tested with daily/weekly tests with a
ROCm 6.0 disk image and tests are passing.
Change-Id: I6122fbc194e8d6f5d38e81f1b0e11646d90e0ea0
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.
Using misc since there is not enough room to fit all the tags.
Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
The PR contains the following changes:
- Move all of the config options(`env["CONF"]`) from SConsopt to Kconfig
files
- Update `build_opts` files to Kconfig option formats
- The Ruby Protocol files are only built if `RUBY=y`
- Remove the default-default build target
- Kconfig commands are included in the PR:
- defconfig
- setconfig
- meunconfig
- guiconfig
- listnewconfig
- savedefconfig
- oldconfig
- olddefconfig
- Add the `python3-tk` package dependencies
Jira issue: https://gem5.atlassian.net/browse/GEM5-1211
The GPU device keeps a local copy of each ring buffers read pointer
(rptr) to avoid constant DMAs to/from host memory. This means it needs
to be periodically updated on the host side as the driver uses this to
determine how much space is left in the queue and may hang if it believe
the queue is full. For user-mode queues, this already happens when
queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is
never updated leading to a hang.
In this patch the rptr for *all* queues is reported back to the kernel
whenever the queue reaches an empty state (rptr == wptr). Additionally
to handle PM4 queue wrap-around, the queue processing function checks if
the queue is not empty instead of rptr < wptr. This is state because the
driver fills PM4 queues with NOP packets on initialization and when wrap
around occurs.
Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f
These are not yet consumed by anything, but convert all the settings
from SCons variables to Kconfig variables.
If you have existing SConsopts files which need to be converted, you
should take a look at KCONFIG.md to learn about how kconfig is used in
gem5. You should decide if any variables need to be available to C++ or
kconfig itself, and whether those are options which should be detected
automatically, or should be up to the user. Options which should be
measured automatically should still be in SConsopts files, while user
facing options should be added to new or existing Kconfig files.
Generally, make sure you're storing c++/kconfig visible options in
env['CONF'][...]. Also remove references to sticky_vars since persistent
options should now be handled with kconfig, and export_vars since
everything in env['CONF'] is now exported automatically.
Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which
is still a sticky SCons variable. This is necessary because EXTRAS also
controls what config options exist. If it came from Kconfig itself, then
there would be a circular dependency. This dependency could
theoretically be handled by reparsing the Kconfig when EXTRAS
directories were added or removed, but that would be complicated, and
isn't supported by kconfiglib. It wouldn't be worth the significant
effort it would take to add it, just to use Kconfig more purely.
Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit checkpoints the existing VMID map so that any
new doorbells after restoration use a unique queue ID
Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
https://github.com/gem5/gem5/pull/386 included two cases in
"src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer
of type `size_t` and another of type `Addr`. This cause an error on my
Apple Silicon Mac as this is a comparison between an "unsigned long"
and an "unsigned long long" which (at least on my setup) was not
permitted. To fix this issue the `reg_size` was changed from `size_t` to
`Addr`, as well as it the types of the values it was derived from and
the variable used to hold the return from the `std::min` calls.
Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration
Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
Print extra logs for the full/partial read/write access to the registers
through the register bank. The debug flag is empty by default and would
not print anything.
Test: run unittest of dev/reg_bank.test.xml to check the behavior would
not affect the original functionality.
run gem5 with debug flags and use m5term to poke on registers.