This PR is offloading some of the partitioning logic to the partitioning
manager, effectively changing
the partitioning interface. Rather than always relying on the
PartitionFieldExtention data structure to
convey partition IDs, we make it implementation defined by introducing
the partitioning manager abstraction.
We want user to be able to extract the partitionId more flexibly and
this requires using a SimObject.
Users can extend the PartitioningManager, overriding the
readPacketPartitionId, therefore providing their
own mean of injecting/extracting partitioning data from a packet
The new ISA-agnostic interface is the PartitionManager.
We therefore make the PartitionFieldExtention private to the
Arm implementation of memory partitioning (FEAT_MPAM)
Any other partitioning implementation should override the
PartitionManager::readPacketPartitionID to provide a mean
for extracting partitioning data (partition_id) from the
incoming Packet.
With this commit we also define an MPAM MSC which is
supposed to be the partitioning manager for the
Memory System Component
Change-Id: I6959ace0c0cbca549dcc1aacd53dff223b5fe328
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Remove the following files:
* src/dev/virtio/rng 2.cc
* src/dev/virtio/rng 2.hh
Which were a copy of rng.hh and rng.cc. Probably added to the repository
by accident. They were not compiled by scons
Change-Id: I9d1da19cc243c513ab7af887b1b6260d8e361b57
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Currently gem5 assumes that there is only one command processor (CP)
which contains the PM4 packet processor. Some GPU devices have multiple
CPs which the driver tests individually during POST if they are used or
not. Therefore, these additional CPs need to be supported.
This commit allows for multiple PM4 packet processors which represent
multiple CPs. Each of these processors will have its own independent
MMIO address range. To more easily support ranges, the MMIO addresses
now use AddrRange to index a PM4 packet processor instead of the
hard-coded constexpr MMIO start and size pairs.
By default only one PM4 packet processor is created, meaning the
functionality of the simulation is unchanged for devices currently
supported in gem5.
Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c
PCIe can read/write to any 32-bit address using the PCI index/index2
registers as an address and then reading/writing the corresponding
data/data2 register.
This commit adds this functionality and removes one magic value being
written to support GPU POST. This feature is disabled for Vega10 which
relies on an MMIO trace for too many values to implement in the MMIO
interface.
Change-Id: Iacfdd1294a7652fc3e60304b57df536d318c847b
The SRBM write packets where previously not required. This commit
implements SRBM writes to set a register by using the new setRegVal
interface. SRBM writes seem to be used for SRIOV enabled devices.
Change-Id: I202653d339e882e8de59d69a995f65332b2dfb8c
The top level AMDGPUDevice currently reads/writes all unknown registers
to/from a map containing the previously written value. This is intended
as a way to handle registers that are not part of the model but the
driver requires for functionality. Since this is at the top level, it
can mask changes to register values which do not go through the same
interface. For example, reading an MMIO, changing via PM4 queue, and
reading again returns the stale cached value.
This commit removes the usage of the regs map in AMDGPUDevice,
implements some important MMIOs that were previously handled by it, and
moves the unknown register handling to the NBIO aperture only. To reduce
the number of additional MMIOs to implement, the display manager in
vega10 is now disabled.
Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2
The SDMA engine can potentially be used to write to the GART address
range. Since gem5 has a shadow copy of the GART table to avoid sending
functional reads to device memory, the GART table must be updated when
copying to the GART range.
This changeset adds a check in the VM for GART range and implements the
SDMA copy packet writing to the GART range. A fatal is added to write
and ptePde, which are the only other two ways to write to memory, as
using these packets to update the GART table has not been observed.
Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd
The write data packet can write multiple dwords but currently always
assumes there is one dword, which can cause some write data to be
missed. This case is not common, but the number of dwords is implicitly
defined in the PM4 header.
This changeset passes the PM4 header to write data so that the correct
number of dwords can be determined. For now we assume no page crossing
when writing multiple dwords as the driver should be checking for that.
Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7
The ROCm 6.0 driver adds a node_id field to interrupts which must match
before passing on the interrupt to be cleared by the cookie from gem5's
interrupt handler implementation. Add this field and enable for gfx942.
The usage of the field can be seen in event_interrupt_isr_v9_4_3 at
https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/
gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449
Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922
The SMMU_IRQ_CTRL had been made optionally writeable by a
prior patch [1] even if interrupts were not supported in
the SMMUv3 model.
As we are partially enabling IRQ support, we remove this option
and we make the SMMU_IRQ_CTRL always writeable
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/38555
Change-Id: Ie1f9458d583a5d8bcbe450c3e88bda6b3c53cf10
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
See https://github.com/orgs/gem5/discussions/898
The SMMUv3 Event Queue is basically unused at the moment. Whenever a
transaction fails we actually abort simulation. The sendEvent method
could be used to actually report the failure to the driver but it is
lacking interrupt support to notify the PE there is an event to handle.
The SMMUv3 spec allows both wired and MSI interrupts to be used.
We add the eventq_irq SPI param to the SMMU object and we draft an
initial sendInterrupt utility that makes use of it whenever it is
needed.
Change-Id: I6d103919ca8bf53794ae4bc922cbdc7156adf37a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Rely on the architected solution instead of aborting simulation.
This means handling writes to the Event queue to signal managing
software there was a fault in the SMMU
Change-Id: I7b69ca77021732c6059bd6b837ae722da71350ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The struct fields of the SMMUEvent were not matching the SMMUv3 specs.
This was "not an issue" as events have been implicitly disabled until
now (every translation error was aborting simulation)
With generateEvent we automatically construct a SMMU event from
a translation result.
Change-Id: Iba6a08d551c0a99bb58c4118992f1d2b683f62cf
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
A faulting translation should return additional information
(other than the fault type). This will be used by future
patches to properly populate the SMMU event record of the
event queue
As we currenlty support two faults only:
1) F_TRANSLATION
2) F_PERMISSION
We add to TranslResult the relevant fault information only:
type, class, stage and ipa
Change-Id: I0a81d5fc202e1b6135cecdcd6dfd2239c2f1ba7e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reading the Context Descriptor (CD) might require a stage2
translation. At the moment doReadCD does not check for the
return value of the translateStage2.
This means that any stage2 fault will be silently discarded
and an invalid address will be used/returned.
By returning a translation result we make sure any error
happening in the second stage of translation will be properly
flagged
Change-Id: I2ecd43f7e23080bf8222bc3addfabbd027ee8feb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We don't check the fault type directly. This will improve
readability once the TranslResult class will be augmented
with extra fields
Change-Id: I5acafaabf098d6ee79e1f0c384499cc043a75a9d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
One of the limitations of the RegBank class is that it does not allow
you to pass a non-contiguous set of registers. Its simplest form will
just accept an initializer list of registers and it will store them in
sequence.
A more refined version [1] will optionally accept an offset value to be
passed alongside the register reference. This is not meant to be used by
the register bank to store the register at the provided offset.
It is rather used by the bank to sanity check the register sits exactly
at the provided range.
The way to work around this for a fragemented register space is to
explicitly allocate RAZ/RAO blocks as registers and to pass them to
addRegisters together with the others. (See the SysSecCtrl [2] as an
example)
This makes it a bit tedious to model a register bank with gaps between
its registers. First, the exact number and position of the gaps needs to
be extraced from a spec. These sometimes report only implemented
registers and their offset, and omit to document gaps/reserved space. So
a developer needs to manually add register offset and size to check if
all registers are contiguous. Second, these reserved register blocks
need to be instantiated in the bank adding boilerplate code and
affecting readibility.
For these reasons we add a new registration method, called
addRegisters*At*. It reuses the RegisterAdder class but this time the
offset field is really used to instruct the bank where the register
should be mapped. The method is templated and the template parameter
tells the bank which register type should be used to fill the remaining
space. We make the RegBank the owner of this filler space (registers are
generated internally within addRegistersAt).
[1]: https://github.com/gem5/gem5/blob/stable/src/dev/reg_bank.hh#L106
[2]: https://github.com/gem5/gem5/blob/stable/src/dev/arm/ssc.cc#L48
Change-Id: I614ae6e9eeb40b365ac9b6dd8b75abbfdb9cb687
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Update the PLIC based on the
[riscv-plic-spec](https://github.com/riscv/riscv-plic-spec) in the PR:
- Support customized PLIC hardID and privilege mode configuration
- Backward compatable with the n_contexts parameter, will generate the
config like {0,M}, {0,S}, {1,M} ...
Change-Id: Ibff736827edb7c97921e01fa27f503574a27a562
ArmSigInterruptPin don't send the interrupt to GIC. Instead it sends the
interrupt to the irq specified in Param. When using ArmSigInterruptPin,
we shouldn't ask users to provide "Platform" since it doesn't need it.
To reduce the confusion, this change removes the dependency of Platform
for ArmSigInterruptPin.
Change-Id: I0ee507ed1c08b4fa6d3e384e28732f3acb4f6892
The PCI configuration space is 256 bytes, yet because the
PCI_CONFIG_SIZE macro is 0xff, the final register allocation in the IDE
controller only allocated up to byte 255.
Change-Id: I1aef2cad9df366ee8425edb410037061eb29ae33
If the NOP count of an SDMA NOP packet goes beyond the wptr address, the
queue decode method will loop infinitely. If a packet comes in with a
bad count this causes gem5 to hang. This change advances the rptr one
dword at a time until either reaching the NOP count or when rptr == wptr
to prevent this issue.
Change-Id: Ib2c0f74a477bff27890c9c064bb4190e76e513bd
Related to issue #703 , this PR removes GCN3 related files and updates
source code, documentation, and tests to switch over to Vega is that was
not done already. Highlights are:
- Remove all src/arch/amdgpu/gcn3 files and update Kconfigs.
- Remove references to GCN3 and replace with Vega where applicable.
- Update the build targets in the gcn-gpu Docker. This will need to be
rebuilt but not urgently.
- Remove the GCN3 tag in testlib. Most tests seem to be using Vega
already, so that commit is small.
By default all SDMA queues are privileged queues, meaning the addresses
in SDMA packets use the privileged translation tables. RLC queues
(sometimes called user queues) are not necessarily privileged and might
use user translation tables. RLC queues are used more often in ROCm 6.0
exposing an issue with invalid translations with RLC queues.
This changeset checks the priv bit in the SDMA MQD when an RLC queue is
mapped. Each packet type which uses an address then checks the bit
before performing translation. Tested with daily/weekly tests with a
ROCm 6.0 disk image and tests are passing.
Change-Id: I6122fbc194e8d6f5d38e81f1b0e11646d90e0ea0
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.
Using misc since there is not enough room to fit all the tags.
Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
The PR contains the following changes:
- Move all of the config options(`env["CONF"]`) from SConsopt to Kconfig
files
- Update `build_opts` files to Kconfig option formats
- The Ruby Protocol files are only built if `RUBY=y`
- Remove the default-default build target
- Kconfig commands are included in the PR:
- defconfig
- setconfig
- meunconfig
- guiconfig
- listnewconfig
- savedefconfig
- oldconfig
- olddefconfig
- Add the `python3-tk` package dependencies
Jira issue: https://gem5.atlassian.net/browse/GEM5-1211
The GPU device keeps a local copy of each ring buffers read pointer
(rptr) to avoid constant DMAs to/from host memory. This means it needs
to be periodically updated on the host side as the driver uses this to
determine how much space is left in the queue and may hang if it believe
the queue is full. For user-mode queues, this already happens when
queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is
never updated leading to a hang.
In this patch the rptr for *all* queues is reported back to the kernel
whenever the queue reaches an empty state (rptr == wptr). Additionally
to handle PM4 queue wrap-around, the queue processing function checks if
the queue is not empty instead of rptr < wptr. This is state because the
driver fills PM4 queues with NOP packets on initialization and when wrap
around occurs.
Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f
These are not yet consumed by anything, but convert all the settings
from SCons variables to Kconfig variables.
If you have existing SConsopts files which need to be converted, you
should take a look at KCONFIG.md to learn about how kconfig is used in
gem5. You should decide if any variables need to be available to C++ or
kconfig itself, and whether those are options which should be detected
automatically, or should be up to the user. Options which should be
measured automatically should still be in SConsopts files, while user
facing options should be added to new or existing Kconfig files.
Generally, make sure you're storing c++/kconfig visible options in
env['CONF'][...]. Also remove references to sticky_vars since persistent
options should now be handled with kconfig, and export_vars since
everything in env['CONF'] is now exported automatically.
Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which
is still a sticky SCons variable. This is necessary because EXTRAS also
controls what config options exist. If it came from Kconfig itself, then
there would be a circular dependency. This dependency could
theoretically be handled by reparsing the Kconfig when EXTRAS
directories were added or removed, but that would be complicated, and
isn't supported by kconfiglib. It wouldn't be worth the significant
effort it would take to add it, just to use Kconfig more purely.
Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit checkpoints the existing VMID map so that any
new doorbells after restoration use a unique queue ID
Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
https://github.com/gem5/gem5/pull/386 included two cases in
"src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer
of type `size_t` and another of type `Addr`. This cause an error on my
Apple Silicon Mac as this is a comparison between an "unsigned long"
and an "unsigned long long" which (at least on my setup) was not
permitted. To fix this issue the `reg_size` was changed from `size_t` to
`Addr`, as well as it the types of the values it was derived from and
the variable used to hold the return from the `std::min` calls.
Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration
Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
Print extra logs for the full/partial read/write access to the registers
through the register bank. The debug flag is empty by default and would
not print anything.
Test: run unittest of dev/reg_bank.test.xml to check the behavior would
not affect the original functionality.
run gem5 with debug flags and use m5term to poke on registers.
The amdgpu driver can, at *any* time, tell the device to unmap a queue
to force the queue descriptor to be written back to main memory in the
form of a memory queue descriptor (MQD). It will then immediately remap
the queue and continue writing the doorbell to the queue. It is possible
that the doorbell write occurs after the queue is unmapped but before it
is remapped. In this situation, we need to check the updated value of
the doorbell for the queue and write that to the queue after it is
mapped.
To handle this, a pending doorbell packet map is created to hold a
packet to replay when the queue is mapped. Because PCI in gem5
implements only the atomic protocol port, we cannot use the original
packet as it must respond in the same Tick. This patch fixes issues with
the doorbell maps not being cleared on unmapping to ensure the doorbell
is not found in writeDoorbell and places in the pending doorbell map.
This includes fixing the doorbell offset value in the doorbell to VMID
map which was is now multiplied by four as it is a dword address.
This was tested using tensorflow 2.0's MNIST example which was seeing
this issue consistently. With this patch it now makes progress and does
issue pending doorbell writes.
Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e
gem5 does not currently implement any vendor-specific HSA packets.
Starting in ROCm 5.5, vendor packets appear to end with a completion
signal. Not sending this completion causes gem5 to hang. Since these
packets are not documented anywhere and need to be reverse engineered we
send the completion signal, if non-zero, and finish the packet as is the
current behavior.
Testing: HIP examples working on most recent ROCm release (5.7.1).
Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c
While it's plausible to define the cache_line_size as a 32-bit unsigned
int, the use of cache_line_size is way out of its original scope.
cache_line_size has been used to produce an address mask, which masking
out the offset bits from an address. For example, [1], [2], [3], and
[4]. However, since the cache_line_size is an "unsigned int", the type
of the value is not guaranteed to be 64-bit long. Subsequently, the bit
twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask, i.e.,
0x00000000FFFFFFC0.
This behavior at least caused a problem in LLSC in RISC-V [5], where the
load reservation (LR) relies on the mask to produce the cache block
address. Two distinct 64-bit addresses can be mapped to the same cache
block using the above mask.
This patch explicitly defines cache_line_size as a 64-bit unsigned int
so the cache block mask can be produced correctly for 64-bit addresses.
[1]
3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)
[2]
3bdcfd6f7a/src/cpu/simple/timing.hh (L224)
[3]
3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)
[4]
3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)
[5]
3bdcfd6f7a/src/arch/riscv/isa.cc (L787)
Earlier, GPU checkpointing was working only if a checkpoint was created
before the first kernel execution. This pull request adds support to
checkpoint in-between any two kernel calls. It does so by doing the
following.
- Adds flush support in the GPU_VIPER protocol
- Adds flush support in the GPUCoalescer
- Updates cache recorder to use the GPUCoalescer during simulation
cooldown and cache warmup times.
The ROCr runtime uses a combination of HSA signal timestamps and
hardware MMIOs to calculate profiling times. At the beginning of an
application a timestamp is read from the GPU using MMIOs. The clock
MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added
to handle these MMIOs.
The timestamp value is expected to be in nanoseconds, so we simply use
the gem5 tick converted to ns.
Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4
The AMD specific HSA signal contains start/end timestamps for dispatch
packet completion signals. These are current always zero. These
timestamp values are used for profiling in the ROCr runtime.
Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check
for zero values before dividing, causing applications that use profiling
to crash with SIGFPE. Profiling is used via hipEvents in the HACC
application, so these should be supported in gem5.
In order to handle writing the timestamp values, we need to DMA the
values to memory before writing the completion signal. This changes the
flow of the async completion signal write to be (1) read mailbox pointer
(2) if valid, write the mailbox data, other skip to 4 (3) write mailbox
data if pointer is valid (4) write timestamp values (5) write completion
signal. The application will process the timestamp data as soon as the
completion signal is received, so we need to ordering to ensure the DMA
for timestamps was completed.
HACC now runs to completion on GPUFS and has the same output was
hardware.
Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e
During checkpoint restoration, the unserialize() function writes rptr,
wptr, and indirect buffer rptr, wptr to PM4 queue's rptr, wptr fields.
This commit updates this to write only the relevant pointers to the
queue structure. If indirect buffers are used, then it writes only the
indirect buffer pointers to the queue. If they are not used, then it
writes rptr, wptr values to the queue.
Change-Id: Iedb25a726112e1af99cc1e7bc012de51c4ebfd45
GPUFS uses aql information from PM4 queues to initialize doorbells. This
commit adds aql information to the checkpoint so that it can be used
during restoration to correctly initialize all doorbells. Additionally,
this commit also sets the hsa queue correctly during checkpoint-restoration
Change-Id: Ief3ef6dc973f70f27255234872a12c396df05d89