Commit Graph

1450 Commits

Author SHA1 Message Date
Bobby R. Bruce
d11c40dcac misc: Run pre-commit run --all-files
This ensures `isort` is applied to all files in the repo.

Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9
2023-11-29 22:06:41 -08:00
Bobby R. Bruce
d94d6017b0 scons: Change to Kconfig build system (#69)
The PR contains the following changes:
- Move all of the config options(`env["CONF"]`) from SConsopt to Kconfig
files
- Update `build_opts` files to Kconfig option formats
- The Ruby Protocol files are only built if `RUBY=y`
- Remove the default-default build target
- Kconfig commands are included in the PR:
    - defconfig
    - setconfig
    - meunconfig
    - guiconfig
    - listnewconfig
    - savedefconfig
    - oldconfig
    - olddefconfig
- Add the `python3-tk` package dependencies
 
Jira issue: https://gem5.atlassian.net/browse/GEM5-1211
2023-11-27 13:59:18 -08:00
Matthew Poremba
9e6a87e67a dev-amdgpu: Writeback PM4 queue rptr when empty (#597)
The GPU device keeps a local copy of each ring buffers read pointer
(rptr) to avoid constant DMAs to/from host memory. This means it needs
to be periodically updated on the host side as the driver uses this to
determine how much space is left in the queue and may hang if it believe
the queue is full. For user-mode queues, this already happens when
queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is
never updated leading to a hang.

In this patch the rptr for *all* queues is reported back to the kernel
whenever the queue reaches an empty state (rptr == wptr). Additionally
to handle PM4 queue wrap-around, the queue processing function checks if
the queue is not empty instead of rptr < wptr. This is state because the
driver fills PM4 queues with NOP packets on initialization and when wrap
around occurs.

Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f
2023-11-27 11:02:11 -08:00
Gabe Black
db3a6e8e84 scons: Use Kconfig to configure gem5.
These are not yet consumed by anything, but convert all the settings
from SCons variables to Kconfig variables.

If you have existing SConsopts files which need to be converted, you
should take a look at KCONFIG.md to learn about how kconfig is used in
gem5. You should decide if any variables need to be available to C++ or
kconfig itself, and whether those are options which should be detected
automatically, or should be up to the user. Options which should be
measured automatically should still be in SConsopts files, while user
facing options should be added to new or existing Kconfig files.

Generally, make sure you're storing c++/kconfig visible options in
env['CONF'][...]. Also remove references to sticky_vars since persistent
options should now be handled with kconfig, and export_vars since
everything in env['CONF'] is now exported automatically.

Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which
is still a sticky SCons variable. This is necessary because EXTRAS also
controls what config options exist. If it came from Kconfig itself, then
there would be a circular dependency. This dependency could
theoretically be handled by reparsing the Kconfig when EXTRAS
directories were added or removed, but that would be complicated, and
isn't supported by kconfiglib. It wouldn't be worth the significant
effort it would take to add it, just to use Kconfig more purely.

Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9
2023-11-23 08:26:10 +08:00
Bobby R. Bruce
23a22ed95c dev-amdgpu: Add VMID map to checkpoint (#570)
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration
2023-11-22 10:05:21 -08:00
Vishnu Ramadas
06161ded8c dev-amdgpu: Add VMID map to checkpoint
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit checkpoints the existing VMID map so that any
new doorbells after restoration use a unique queue ID

Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
2023-11-20 21:19:17 -06:00
Bobby R. Bruce
08c0d1f27a dev: Fix std::min type mismatch in reg_bank.hh
https://github.com/gem5/gem5/pull/386 included two cases in
"src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer
of type `size_t` and another of type `Addr`. This cause an error on my
Apple Silicon Mac as this is a comparison between an "unsigned long"
and an "unsigned long long" which (at least on my setup) was not
permitted. To fix this issue the `reg_size` was changed from `size_t` to
`Addr`, as well as it the types of the values it was derived from and
the variable used to hold the return from the `std::min` calls.

Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746
2023-11-20 17:33:44 -08:00
Vishnu Ramadas
d19d6fc31e dev-amdgpu: Add PM4 queue ID to GPU used VMID map
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration

Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
2023-11-16 17:30:00 -06:00
hungweihsuG
83f1fe3fec dev: add debug flag in register bank. (#386)
Print extra logs for the full/partial read/write access to the registers
through the register bank. The debug flag is empty by default and would
not print anything.

Test: run unittest of dev/reg_bank.test.xml to check the behavior would
not affect the original functionality.
run gem5 with debug flags and use m5term to poke on registers.
2023-11-15 10:04:46 -08:00
Jason Lowe-Power
71973b386e gpu-compute,dev-hsa: ROCm 5.5+ support (#498)
ROCm 5.5 support including:
- Vendor packet completion signals
- Queue remapping race condition fix
- Backwards compatible GPR allocation
- Fix transient readBlob fatal reading kernel descriptor
2023-11-06 10:51:37 -08:00
Matthew Poremba
37da1c45f3 dev-amdgpu: Better handling for queue remapping
The amdgpu driver can, at *any* time, tell the device to unmap a queue
to force the queue descriptor to be written back to main memory in the
form of a memory queue descriptor (MQD). It will then immediately remap
the queue and continue writing the doorbell to the queue. It is possible
that the doorbell write occurs after the queue is unmapped but before it
is remapped. In this situation, we need to check the updated value of
the doorbell for the queue and write that to the queue after it is
mapped.

To handle this, a pending doorbell packet map is created to hold a
packet to replay when the queue is mapped. Because PCI in gem5
implements only the atomic protocol port, we cannot use the original
packet as it must respond in the same Tick. This patch fixes issues with
the doorbell maps not being cleared on unmapping to ensure the doorbell
is not found in writeDoorbell and places in the pending doorbell map.
This includes fixing the doorbell offset value in the doorbell to VMID
map which was is now multiplied by four as it is a dword address.

This was tested using tensorflow 2.0's MNIST example which was seeing
this issue consistently. With this patch it now makes progress and does
issue pending doorbell writes.

Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e
2023-11-01 14:52:39 -05:00
Matthew Poremba
d05433b3f6 gpu-compute,dev-hsa: Send vendor packet completion signal
gem5 does not currently implement any vendor-specific HSA packets.
Starting in ROCm 5.5, vendor packets appear to end with a completion
signal. Not sending this completion causes gem5 to hang. Since these
packets are not documented anywhere and need to be reverse engineered we
send the completion signal, if non-zero, and finish the packet as is the
current behavior.

Testing: HIP examples working on most recent ROCm release (5.7.1).

Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c
2023-11-01 14:52:39 -05:00
Hoa Nguyen
50196863a4 stdlib,dev: Fix several hardcoded RISC-V ISA strings
The "s" and "u" letters are not recognized by the Linux kernel as
RISC-V extensions [1].

[1] https://elixir.bootlin.com/linux/v6.5.7/source/arch/riscv/kernel/cpufeature.c#L170

Change-Id: I2a99557482cde6e6d6160626b3995275c41b1577
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-10-25 20:12:57 +00:00
Bobby R. Bruce
334df18dce arch-riscv: Add bootloader+kernel workload (#390)
Aims to boot OpenSBI + Linux kernel.
2023-10-18 09:17:12 -07:00
Bobby R. Bruce
d42eeb6b68 cpu: Explicitly define cache_line_size -> 64-bit unsigned int (#329)
While it's plausible to define the cache_line_size as a 32-bit unsigned
int, the use of cache_line_size is way out of its original scope.

cache_line_size has been used to produce an address mask, which masking
out the offset bits from an address. For example, [1], [2], [3], and
[4]. However, since the cache_line_size is an "unsigned int", the type
of the value is not guaranteed to be 64-bit long. Subsequently, the bit
twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask, i.e.,
0x00000000FFFFFFC0.

This behavior at least caused a problem in LLSC in RISC-V [5], where the
load reservation (LR) relies on the mask to produce the cache block
address. Two distinct 64-bit addresses can be mapped to the same cache
block using the above mask.

This patch explicitly defines cache_line_size as a 64-bit unsigned int
so the cache block mask can be produced correctly for 64-bit addresses.

[1]
3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)
[2]
3bdcfd6f7a/src/cpu/simple/timing.hh (L224)
[3]
3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)
[4]
3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)
[5]
3bdcfd6f7a/src/arch/riscv/isa.cc (L787)
2023-10-16 07:50:35 -07:00
Bobby R. Bruce
ddf6cb88e4 misc: Run pre-commit run --all-files
This is reflect the updates made to black when running `pre-commit
autoupdate`.

Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919
2023-10-10 14:01:58 -07:00
Matt Sinclair
ec633b3d68 dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377)
Earlier, GPU checkpointing was working only if a checkpoint was created
before the first kernel execution. This pull request adds support to
checkpoint in-between any two kernel calls. It does so by doing the
following.

- Adds flush support in the GPU_VIPER protocol
- Adds flush support in the GPUCoalescer
- Updates cache recorder to use the GPUCoalescer during simulation
cooldown and cache warmup times.
2023-10-10 09:41:21 -05:00
Matthew Poremba
75a7f30dfb dev-amdgpu: Implement GPU clock MMIOs
The ROCr runtime uses a combination of HSA signal timestamps and
hardware MMIOs to calculate profiling times. At the beginning of an
application a timestamp is read from the GPU using MMIOs. The clock
MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added
to handle these MMIOs.

The timestamp value is expected to be in nanoseconds, so we simply use
the gem5 tick converted to ns.

Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4
2023-10-06 13:21:40 -05:00
Matthew Poremba
6a4b2bb096 dev-hsa,gpu-compute: Add timestamps to AMD HSA signals
The AMD specific HSA signal contains start/end timestamps for dispatch
packet completion signals. These are current always zero. These
timestamp values are used for profiling in the ROCr runtime.
Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check
for zero values before dividing, causing applications that use profiling
to crash with SIGFPE. Profiling is used via hipEvents in the HACC
application, so these should be supported in gem5.

In order to handle writing the timestamp values, we need to DMA the
values to memory before writing the completion signal. This changes the
flow of the async completion signal write to be (1) read mailbox pointer
(2) if valid, write the mailbox data, other skip to 4 (3) write mailbox
data if pointer is valid (4) write timestamp values (5) write completion
signal. The application will process the timestamp data as soon as the
completion signal is received, so we need to ordering to ensure the DMA
for timestamps was completed.

HACC now runs to completion on GPUFS and has the same output was
hardware.

Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e
2023-10-06 13:21:40 -05:00
Hoa Nguyen
6f8b74ece8 dev,arch-riscv: Mark gem5's 8250 UART as 16550a compatible
8250 UART is supposed to be compatible to 16550a UART.

This enables OpenSBI to print things to UART as OpenSBI only
prints if the UART is 16550a compatible [1].

There is a similar change from gem5 gerrit [2] pointing out
that this also enables bbl to print things to UART. This is
confirmed :)

[1] https://github.com/riscv-software-src/opensbi/blob/v1.3.1/lib/utils/serial/fdt_serial_uart8250.c#L29
[2] https://gem5-review.googlesource.com/c/public/gem5/+/68481

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-10-06 00:48:12 -07:00
Vishnu Ramadas
f69191a31d dev-amdgpu: Remove duplicate writes to PM4 queue pointers
During checkpoint restoration, the unserialize() function writes rptr,
wptr, and indirect buffer rptr, wptr to PM4 queue's rptr, wptr fields.
This commit updates this to write only the relevant pointers to the
queue structure. If indirect buffers are used, then it writes only the
indirect buffer pointers to the queue. If they are not used, then it
writes rptr, wptr values to the queue.

Change-Id: Iedb25a726112e1af99cc1e7bc012de51c4ebfd45
2023-10-02 19:37:46 -05:00
Vishnu Ramadas
107e05266d dev-amdgpu: Add aql, hsa queue information to checkpoint-restore
GPUFS uses aql information from PM4 queues to initialize doorbells. This
commit adds aql information to the checkpoint so that it can be used
during restoration to correctly initialize all doorbells. Additionally,
this commit also sets the hsa queue correctly during checkpoint-restoration

Change-Id: Ief3ef6dc973f70f27255234872a12c396df05d89
2023-10-02 19:02:50 -05:00
Hoa Nguyen
1fc89bc8ae cpu,mem,dev: Use Addr for cacheLineSize
Change-Id: I2f056571dbf35081d58afda09726c600141d5a05
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-09-20 14:16:46 -07:00
Matthew Poremba
63cabf2848 dev-amdgpu: Handle GPU atomics on host memory addresses
It is possible to execute a GPU atomic instruction using a memory
address that is in the host memory space (e.g, HMM, __managed__,
hipHostMalloc'd address). Since these are in host memory they are passed
to the SystemHub DmaDevice. However, this currently executes as a write
packet without modifying data. This leads to hangs in applications that
use atomics for forward progress (e.g., HeteroSync).

It is not clear where these are handled on a real GPU, but they are
certianly not handled by the software stack nor driver, so they must be
handled in hardware and therefore implemented in gem5. Handling for
atomics in the SystemHub makes the most sense.

To make atomics work a few extra changes need to be made to the
SystemHub. (1) The atomic is implemented as a host memory read, followed
by calling the AtomicOpFunctor, followed by a write. This requires a
second event to handle read response, performing atomic, and issuing a
write. (2) Atomics must be serialized otherwise two atomics might return
the same value which is incorrect. This patch adds serialization logic
for all request types to the same address to handle this. (3) With the
added complexity of the SystemHub, a new debug flag explicitly for
SystemHub is added.

Testing done: The heterosync application with input "sleepMutex 10 16 4"
previously hung before this patch. It passes with the patch applied.
This application tests both (1) and (2) above, as it allocates locks
with hipHostMalloc and has multiple workgroups sending an atomic request
in the same Tick, verifying the serialization mechanism.

Change-Id: Ife84b30037d1447dd384340cfeb06fdfd472fff9
2023-09-20 13:52:25 -05:00
Matthew Poremba
57b3d2897c gpu-compute: Use timing DMAs for GPUFS HSA signals
The functional HSA signal read was a hack left in the gpu-compute code.
In full system, this functional read is causing problems occasionally
with the translation not yet being in the page table. The error message
output by gem5 was a fatal message on the readBlob method in port proxy.
Changing this to a timing DMA fixes this problem.

This commit adds the various timing DMA functions to send and receive
response and clean up. A helper method "sendCompletionSignal" is added
to the GPUCommandProcessor because the indentation level was getting too
deep. This change applies only to FS mode. Code for SE mode is
equivalent to what it was before this commit.

Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48
2023-08-25 13:10:51 -05:00
Matthew Poremba
addba01d29 configs,dev-amdgpu: Add PCI express capability info
The ROCm stack requires PCI express atomics. Currently the first PCI
CapabilityPtr does not point to anything, which signals to the OS
(Linux) that this is an early generation PCI device. As PCI express
atomics were introduced later, the CapabilityPtr needs to point to at
least a PCI express capability structure. This capability is defined as
0x10 in Linux. We additionally set the PCI atomic based bits and
implement device specific PCI configuration space reads and writes to
the amdgpu device.

With this commit, the output of simulation when loading the amdgpu
driver no longer outputs "PCIE atomics not supported". Further, an
application which uses PCIe atomics (PyTorch with a reduce_sum kernel)
now makes further progress.

Change-Id: I5e3866979659a2657f558941106ef65c2f4d9988
2023-08-24 09:10:35 -05:00
Matthew Poremba
8b4c38302f dev: PCI: Fix PCI express capability union
The capabilities for PCI express is a struct, instead of a union, like
the other capability unions. A union is used here to provide access to
the ordinal data values when reading/writing an offset while
simultaneously providing human readable field values that can be set
when writing the code.

This commit changes it to union which is likely should be. Nothing
appears to be using this union yet so it is likely an oversight.

Change-Id: I85fe7cc62914525c70fd7a5946d725ed308f8775
2023-08-23 19:32:38 -05:00
Matthew Poremba
3b35e73eb8 dev-amdgpu: Implement SDMA constant fill
This SDMA packet is much more common starting around ROCm 5.4.
Previously this was mostly used to clear page tables after an
application ended and was therefore left unimplemented. It is
now used for basic operation like device memsets.

This patch implements constant fill as it is now necessary.

Change-Id: I9b2cf076ec17f5ed07c20bb820e7db0c082bbfbc
2023-07-30 13:17:05 -05:00
Ranganath (Bujji) Selagamsetty
ede4d89a83 arch-vega, dev-amdgpu: Fix for memory leaks
When using the new operator, delete should be called
on any allocated memory after it's use is complete.

Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
2023-07-28 19:14:46 -05:00
Bobby R. Bruce
753933d471 gpu-compute, tests: Fix GPU_X86 compilation, add compiler tests (#64)
* gpu-compute: Remove use of 'std::random_shuffle'

This was deprecated in C++14 and removed in C++17. This has been
replaced with std::random. This has been implemented to ensure
reproducible results despite (pseudo)random behavior.

Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116

* dev-amdgpu: Add missing 'overrides'

This causes warnings/errors in some compilers.

Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae

* dev: Fix Linux specific includes to be portable

This allows for compilation in non-linux systems (e.g., Mac OS).

Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2

* tests: Add 'VEGA_X86' build target to compiler-tests.sh

Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5

* tests: Add 'GCN3_X86' build target to compiler-tests.sh

Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123
2023-07-11 14:35:03 -07:00
Wei-Han Chen
b4687aa7d9 dev: Warn when resp packet is error in dma port
This CL adds a warning when the response packet is error.

Change-Id: I8e94dc2b85cd1753a4d6265cfda3cd5d6325f425
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71778
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-07-07 07:12:54 +00:00
Matthew Poremba
079fc47dc2 dev-amdgpu: Perform frame writes atomically
The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.

The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and causes GPUFS to break with that CPU.

This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.

Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
2023-06-29 19:56:49 +00:00
Giacomo Travaglini
b355baac93 dev-arm: Treat GICv3 reserved addresses as RES0
According to the GIC specification (IHI0069) reserved addresses in the
GIC memory map are treated as RES0.  We allow to disable this behaviour
and panic instead (reserved_res0 = False, which is what we have been
doing so far) to catch development bugs (in gem5 and in the guest SW)

Change-Id: I23f98519c2f256c092a52425735b8792bae7a2c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71138
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-06-05 15:01:59 +00:00
Matthew Poremba
6b4a1020be configs,dev-amdgpu: GPUFS MI200/gfx90a support
Add support for MI200-like device. This includes adding PCI IDs and new
MMIOs for the device, a different MAP_PROCESS packet, and a different
calculation for the number of VGPRs.

Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70317
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-25 19:14:32 +00:00
Matthew Poremba
4d18546bfb dev-amdgpu: Update SDMA checkpointing
Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added
support for a variable number of SDMA engines to support newer GPU
models. As part of this an SDMA IDs map was added to map from SDMA ID
number to the SDMA SimObject pointer. In order to get the correct
pointer in unserialize now, we need to store the ID in the checkpoint
and use that to index the new map. We can't simply assign using the loop
variable as the SDMAs might not be in order in the checkpoint and
additionally the checkpoint contains both the gfx and page offset for
the SDMA engines, so each SDMA is inserted into the SDMA offset map
(sdmaEngs) twice.

Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70878
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-05-23 14:28:16 +00:00
Matthew Poremba
08644a7670 dev-amdgpu: Fix nbio psp ring assert
The size of the packet changes between ROCm 4.x and ROCm 5.x. Change how
the address is set based on the incoming packet size so that both
versions continue to work for now.

Change-Id: I91694e4760198fd9129e60140df4e863666be2e2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70677
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2023-05-22 15:08:11 +00:00
Bobby R. Bruce
fcb36458e2 misc: Fix 'unused variable' clang errors with gem5.fast
Change-Id: I2bb8ac10e8db69fa82abe41577cd8e5db575e93d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70297
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
2023-05-08 22:54:06 +00:00
Melissa Jost
dd5b1a674e dev-amdgpu: Remove unused psp_ring_retval integer
This change addresses the compiler failures that have been
causing any GCN3_X86 build to fail.
https://jenkins.gem5.org/job/compiler-checks/589/

Change-Id: Ifd8e2ef89549752ca4aedf0bc9fa47e831a822d3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70217
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-02 16:46:01 +00:00
Matthew Poremba
316538bf8a dev-amdgpu: Enable more GPUs with device specific registers
Currently gem5 assumes the amdgpu device to be Vega10. In order to
support more devices we need to handle situations where different
registers and addresses have the same functionality but different
offsets on different devices.

This changeset adds an NBIO class to handle device discovery and driver
initialization related tasks, pulling them out of the AMDGPUDevice
class. The offsets used for MMIOs are reworked slightly to use offsets
rather than absolute addresses. This is because we cannot determine the
absolute address in the constructor since the BAR has not been assigned
by the OS yet.

Change-Id: I14b364374e086e185978334425a4e265cf2760d0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70041
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-04-28 00:48:35 +00:00
Matthew Poremba
8b91ac6f8d dev-amdgpu: Refactor MMIO interface for SDMA engines
Currently the amdgpu simulated device is assumed to be a Vega10. As a
result there are a few things that are hardcoded. One of those is the
number of SDMAs. In order to add a newer device, such as MI100+, we need
to enable a flexible number of SDMAs.

In order to support a variable number of SDMAs and with the MMIO offsets
of each device being potentially different, the MMIO interface for SDMAs
is changed to use an SDMA class method dispatch table with forwards a
32-bit value from the MMIO packet to the MMIO functions in SDMA of the
format `void method(uint32_t)`. Several changes are made to enable this:

 - Allow the SDMA to have a variable MMIO base and size. These are
   configured in python.
 - An SDMA class method dispatch table which contains the MMIO offset
   relative to the SDMA's MMIO base address.
 - An updated writeMMIO method to iterate over the SDMA MMIO address
   ranges and call the appropriate SDMA MMIO method which matches the
   MMIO offset.
 - Moved all SDMA related MMIO data bit twiddling, masking, etc. into
   the MMIO methods themselves instead of in the writeMMIO method in
   SDMAEngine.

Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70040
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-04-28 00:48:35 +00:00
Matthew Poremba
6c1b95ea41 dev-amdgpu: Default MMIO reads when previously written
If an MMIO was previously written and the driver reads it, we should
return the value that was previously read. This overwrites the MMIO
trace value which is the last resort fallback for finding an MMIO value.
This is needed to initialize newer GPU devices in gem5.

Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70039
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-04-28 00:48:35 +00:00
Matthew Poremba
9c3107c762 dev-amdgpu,configs: Add human readable names for different GPUs
Add a human readable string for GPU device names rather than using the
device ID in the code. This is intended to make code more readable.

Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70038
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2023-04-28 00:48:35 +00:00
Vishnu Ramadas
f5af8b5876 dev-amdgpu: Add a few MQD attributes to GPUFS checkpoint
During GPUFS checkpoint restore, doorbells callbacks are created based
on certain MQD attributes. These callbacks are required to create new
SDMA doorbells. If these attributes are not present in the checkpoint,
the restore hangs indefinitely waiting for ioctl calls that access these
doorbells to finish execution. This commit adds the attributes required
for checkpoint restore to proceed.

Change-Id: Id3d1b7a2627d4c50133d923096495957a233f675
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70077
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
2023-04-27 21:15:46 +00:00
Matthew Poremba
c597361a6b dev-amdgpu: Add writeROM method
For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore
gets routed to the PIO bus in gem5. This gets routed to the GPU in the
case of a ROM write. We write to the ROM as a way to "load" the VBIOS
without creating holes in the KVM VM.

This write method allows the same scripts as KVM to be used by writing
to the ROM area and overwriting what might already be there from the
--gpu-rom option.

Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70037
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2023-04-22 19:57:26 +00:00
Richard Cooper
ed9effca73 dev-arm: Fix writes to Arm GICv2 GICD_IGROUPRn
Writes to the GICD_IGROUPRn registers are currently applied using the
`|=` operator, allowing bits to be set but not cleared. According to
the specification [1] this register should allow direct writes.

This patch changes the logic to write the new value directly to the
register.

[1] https://developer.arm.com/documentation/ihi0048/latest/

Change-Id: Ia5f17d05530263d7e918ff33576daaf8165c25c2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69682
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-04-13 21:09:36 +00:00
Richard Cooper
06637a29e5 arch-arm: Add more detailed debug messages to GICv2.
Converted the generic DPRINTF messages for the GICv2 register reads
and writes (showing only the memory mapped address) to finer grained
DPRINTF messages showing the names of the mapped registers being
accessed.

This change is intended to make it easier to debug the GIC setup from
the gem5 debug trace.

Change-Id: Ic418b2ea8438fed6a5a810ebc0b686cd4c891cb0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69681
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-04-13 21:09:36 +00:00
Gabe Black
716c154b51 arch,base,dev,sim: Convert objects to use the HostSocket param type.
This will make it possible to connect any of these objects with a
named socket, in addition to the usual port numbers.

Change-Id: Id441c3628f62d60608a07c5cb697786e33199981
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69166
Reviewed-by: Jui-min Lee <fcrh@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
2023-04-12 02:18:22 +00:00
Gabe Black
2f5c87c7c6 dev: Add an "abortPending" method to the DMA port class.
This will abort any pending transactions that have been given to the
port.

Change-Id: Ie5f2c702530656a0c4590461369d430abead14cd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69437
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
2023-04-08 08:02:30 +00:00
Vishnu Ramadas
8b7e55339a dev-amdgpu: Add GART translations to GPUFS checkpoint
Earlier, the GART entries were not being checkpointed. Therefore, during
checkpoint restore, certain SDMA instances were initialized with
incorrect addresses that led to incorrect behavior. This commit
checkpoints the GART entries and restores them.

Change-Id: I5464a39ed431e482ff7519b89bd5b664fd992ccf
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69299
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-04-03 22:29:10 +00:00
Vishnu Ramadas
65e0bd6eb4 dev-amdgpu: Added PM4MapQueues to GPUFS checkpoint
The GPUFS checkpoint restoration mechanism expects to find a
PM4MapQueues packet in the checkpoint. Since this was not being
checkpointed, the restore phase retrieved a null packet which led to a
segmentation fault. This commit adds PM4MapQueues to the checkpoint and
restores it when deserializing the checkpoint

Change-Id: Ib74a9f36fe89d740a74f94314ada41ecc363abe9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69298
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
2023-04-03 22:28:57 +00:00