Commit Graph

15006 Commits

Author SHA1 Message Date
Giacomo Travaglini
4b98551aaf Update src/python/gem5/components/cachehierarchies/abstract_cache_hierarchy.py
Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
2024-04-10 16:17:56 -04:00
Giacomo Travaglini
efe397ca92 stdlib: Add DTB generation capabilites to AbstractCacheHierarchy
Now that we are able to provide a view of the cache hierarchy from
the python world, we can start generating DTB entries for caches
and more specifically to properly fill the next-level-cache and
cache-level properties

Change-Id: Iba9ea08fe605f77a353c9e64d62b04b80478b4e2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-04-10 16:17:56 -04:00
Giacomo Travaglini
e6637fc852 stdlib: Use newly defined tree for PrivateL1PrivateL2 hierarchy
Change-Id: I803c6118c4df62484018f9e4d995026adb1bbc2c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-04-10 16:17:56 -04:00
Giacomo Travaglini
d67672facc stdlib: Add tree structure to the AbstractCacheHierarchy
One of things we miss in gem5 is the capability to neatly compose the
cache hierarchy of CPUs and clusters of CPUs.  The BaseCPU
addPrivateSplitL1Caches and addTwoLevelCacheHierarchy APIs have
historically been used to bind cache levels together.

These APIs have been superseeded by the introduction of the Cache
hierarchy abstraction in the standard library. The standard library
makes it cleaner for a user to quickly instantiate a hierarchy of caches
with few lines of code.  While this removes a lot of complexity for a
user, the Hierarchy objects still have little information about their
internal topology.

To address this problem, this patch adds a tree data structure to the
AbstractCacheHierarchy class, where every node of the tree represent
a cache in the hierarchy. In this way we will expose APIs for traversing
and querying the tree.

For example a 2 CPUs system with private L1, private L2 and shared L3
will contain the following tree:

         [root]
           |
          [L3]
           /\
          /  \
        [L2] [L2]
         |    |
        [L1] [L1]

Change-Id: I78fe6ad094f0938ff9bed191fb10b9e841418692
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-04-10 16:17:56 -04:00
Bobby R. Bruce
3af15a535e mem-cache, configs, arch-arm: Handle partitioning policies through a PartitionManager (#966)
This PR is offloading some of the partitioning logic to the partitioning
manager, effectively changing
the partitioning interface. Rather than always relying on the
PartitionFieldExtention data structure to
convey partition IDs, we make it implementation defined by introducing
the partitioning manager abstraction.
We want user to be able to extract the partitionId more flexibly and
this requires using a SimObject.

Users can extend the PartitioningManager, overriding the
readPacketPartitionId, therefore providing their
own mean of injecting/extracting partitioning data from a packet
2024-04-08 16:05:17 -07:00
Ivana Mitrovic
a8d778516d arch-riscv,sim: m5ops argument / return fix for 32 bit RISC-V (#900)
M5Ops C / C++ functions partially use 64 bit arguments and return value.
In general, 64 bit arguments and return values are possible for 32 bit
RISC-V systems as well, since the arguments and the return value is
split into two registers. However, at the moment, this does not work for
32 bit RISC-V systems on the simulator side, since there is a one to one
mapping between argument registers and m5op function parameters.

To solve this problem, the get() function of the RISC-V reg_abi is
updated. It now will merge two registers if there is a 64 bit argument.
For this, the function code has to be passed to the get() function. The
default value of this function code is set to 0xF00, since 0x00 is
already used for M5_ARM. The parameter list of other get() functions for
argument return is also extended by this function code parameter with
the keyword [[maybe_unused]].

To enable a return value of size 64 bit, a0 is assigned with the lower
32 bit and a1 with the higher 32 bit.

Related Issue: https://github.com/gem5/gem5/issues/881
2024-04-08 10:09:17 -07:00
Robert Hauser
841b821261 arch-riscv: fix c.fswsp source register (#998)
RISC-V C.FSWSP format (RISC-V Unprivileged ISA V20191213, page 102):
 
|15-13|12-7|6-2|1-0|
|-------|----|----|----|
|funct3|imm|rs2|op|

Source register is bit 2-6, not bit 20-24


ee6f1377d7/src/arch/riscv/isa/bitfields.isa (L111)


ee6f1377d7/src/arch/riscv/isa/operands.isa (L86)


ee6f1377d7/src/arch/riscv/isa/bitfields.isa (L87)


ee6f1377d7/src/arch/riscv/isa/operands.isa (L80)
2024-04-08 08:41:11 -07:00
Yu-Cheng Chang
71b0b1f2b6 arch-riscv: Fix c.fsw source register (#1005)
RISC-V C.FSW format:


![image](https://github.com/gem5/gem5/assets/32214817/31f46525-23e1-4b36-91ee-968f18b9d32a)
Source register is bit 2-4, not bit 20-24
 

ee6f1377d7/src/arch/riscv/isa/bitfields.isa (L112)


ee6f1377d7/src/arch/riscv/isa/operands.isa (L88)


ee6f1377d7/src/arch/riscv/isa/bitfields.isa (L87)


ee6f1377d7/src/arch/riscv/isa/operands.isa (L80)
2024-04-08 08:30:54 -07:00
Giacomo Travaglini
bdb08a5b6c arch-arm, dev-arm: Fix typo in PartitionFieldExtention name
Rename PartitionFieldExtention into PartitionFieldExtension

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I8072adf78d81b94c5b8bc61a317c0238cf0a9fd9
2024-04-07 11:45:57 +01:00
Giacomo Travaglini
dd45e1c319 misc: Make PartitionFieldExtention private to Arm
The new ISA-agnostic interface is the PartitionManager.
We therefore make the PartitionFieldExtention private to the
Arm implementation of memory partitioning (FEAT_MPAM)

Any other partitioning implementation should override the
PartitionManager::readPacketPartitionID to provide a mean
for extracting partitioning data (partition_id) from the
incoming Packet.

With this commit we also define an MPAM MSC which is
supposed to be the partitioning manager for the
Memory System Component

Change-Id: I6959ace0c0cbca549dcc1aacd53dff223b5fe328
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-04-07 11:45:57 +01:00
Bobby R. Bruce
c65071282d stdlib,tests: Add StatTester SimObject and Scalar tests (#973)
This SimObject can be used to quickly test the statistics are
functioning correctly. The SimObject schedules a single event which sets
the statistics to values dependent on the SimObject params.

With this commit the "Scalar" stats have a StatTester subclass that can
be used for testing. More can be added as required.

Tests are included to check our Scalar SimStat functionality.
2024-04-04 19:12:44 -07:00
Bobby R. Bruce
504005da87 scons,tests: Add 'USE_TEST_OBJECTS' kconfig
This has the SimObjects defined in "src/test_objects" only be compiled
into the gem5 binary if the Kconfig 'USE_TEST_OBJECTS" == 'y'. This
happens in two cases:

1. When 'ALL/gem5' is compiled via "build_opts".
2. When tests are run via "./tests/main.py".

Change-Id: I2330008fd7c7900de5f4de142b8ac89ef4e351ce
2024-04-04 13:04:21 -07:00
Bobby R. Bruce
2771061207 stdlib,tests: Add StatTester SimObject and Scalar tests
This SimObject can be used to quickly test the statistics are
functioning correctly. The SimObject schedules a single event which sets
the statistics to values dependent on the SimObject params.

With this commit the "Scalar" stats have a StatTester subclass that can
be used for testing. More can be added as required.

Tests are included to check our Scalar SimStat functionality.

Change-Id: I78fa5d9a0c3fc7115bd6c6d3410a5436aaa47f55
2024-04-04 13:04:20 -07:00
Bobby R. Bruce
8d7e3fb16b stdlib: Move SimStat 'unit' and 'datatype' field to Scalar (#970)
These are not general statistic properties and better put as a property
of a Scalar value.
2024-04-04 10:02:22 -07:00
Bobby R. Bruce
213b418391 stdlib: Specify typing for SimStat Scalar value (#971) 2024-04-04 08:34:20 -07:00
Bobby R. Bruce
4ff34a75bb stdlib: Fix 'nozero' for Scalar SimStats (#972)
When the `statistics::nozero` flag is set gem5 does not output that stat
if its value is zero. This was not the case for SimStats which output in
this case. This patch correct this behavior.
2024-04-04 08:33:48 -07:00
Giacomo Travaglini
0c6543d781 python: Add is_subset to the AddrRange param class (#993)
This will just call the _m5.range.isSubset method

Change-Id: If747819a008a8ed20796b4efd42a42e5c3a8d7d9

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-04-04 08:12:30 +01:00
Minje Jun
ffd0680a2c mem-ruby: Copyback UD_RU line when evicted in CHI protocol (#945)
This is a followed up fix to #791 mem-ruby: Fix possible dirty line loss
in CHI when ReadShared hit on UD line.
UD_RU line may have stale data since the upstream could have updated the
line, so its local cache line data is treated as invalid
(dataValid=false). But when the line is evicted, it must be written back
to downstream because the upstream may have the line in clean state
(UC). This change fixes it by performing copy back the UD_RU line while
keeping its dataValid as false.

Example error case:
- L3 was in UD_RSC and being evicted without back-invalidation. LLC (HN)
was in RU state.
- Because there's still upstream sharer, L3 sends WriteClean.
- Because the data state was unique and dirty, L3 sends CBWrData_UD_PD.
- LLC becomes UD_RU.
- When the line is evicted from LLC (LocalHN_Eviction), the line is just
dropped, causing the loss of the dirty copy

Co-authored-by: Minje Jun <minje.jun@samsung.com>
2024-04-03 08:33:22 -07:00
Yu-Cheng Chang
1fa25a60c8 arch-riscv: Fix the RiscvBareMetal parameter reset_vect (#964)
The `reset_vect` has exist for a long time and `reset_vect` will not
effect if the user gonna to use customized reset_vect. The CL added the
`auto_reset_vect` to let the config determine the `reset_vect` from
workload entry point or user-specified

Ref: https://gem5-review.googlesource.com/c/public/gem5/+/42053

Change-Id: I928c0dc42aaa85ceabf8d75f9654486496e0ffee
2024-04-03 08:31:57 -07:00
Kaustav Goswami
28b081b348 arch-arm,stdlib: ARM release for_kvm is moved to configs (#986)
This change sets the `release` of the ARM board at the config file
instead of overriding the release on the ArmBoard. This change partially
solves issue 932 as the system taking and restoring the checkpoint is
consistent across KVM and timing CPUs respectively.

Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
2024-04-03 11:48:24 +01:00
Nicholas Mosier
32ee09df4a sim-se: Fix copyOutStatxBuf compile error (#989)
Fix #988. Rewrite statxFunc and copyOutStatxBuf to use platform-agnostic
stat system call, not Linux-specific statx system call.

Change-Id: I3d17e14684e9cd77cdbfd0141b93c3bcbd27dbeb
2024-04-02 14:59:24 -07:00
Bobby R. Bruce
c238b7a3e0 base: Fix 'doGzipLoad' str manipulation (#959)
When running `scons build/ALL/gem5.opt --with-ubsan`, with GCC, the
following error was returned:

```
[     CXX] src/base/loader/image_file_data.cc -> ALL/base/loader/image_file_data.o
In file included from /usr/include/string.h:535,
                 from /usr/include/c++/11/cstring:42,
                 from src/base/cprintf_formats.hh:33,
                 from src/base/cprintf.hh:38,
                 from src/base/logging.hh:49,
                 from src/base/loader/image_file_data.cc:40:
In function ‘char* strcpy(char*, const char*)’,
    inlined from ‘int gem5::loader::doGzipLoad(int)’ at src/base/loader/image_file_data.cc:70:11,
    inlined from ‘gem5::loader::ImageFileData::ImageFileData(const string&)’ atsrc/base/loader/image_file_data.cc:116:24:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:79:33: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ offset [0, 19] is out of the bounds [0, 0] [-Werror=array-bounds]
   79 |   return __builtin___strcpy_chk (__dest, __src, __glibc_objsize (__dest));
      |          ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
scons: *** [build/ALL/base/loader/image_file_data.o] Error 1
scons: building terminated because of errors.
```
2024-04-02 10:37:42 -07:00
Hoa Nguyen
628826896f arch-riscv: Use TeX's escape seq in Python instead of Unicode (#985)
Currently, the citation string has a Unicode character. This works well
in gem5, but it breaks the gem5+SST simulation [1]. This change modifies
the letter "u" with umlaut to use TeX's escape sequence for this letter
instead of using the UTF-8 character.

[1] https://github.com/gem5/gem5/issues/982

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-04-02 08:42:21 -07:00
Robert Hauser
f9a9e50007 sim: adding constructor to GuestAddr
A constructor is added to GuestAddr as suggested in the pull request
feedback. This allows a cast conversion from uint64_t GuestAddr. Hence,
the casting from uint64_t to GuestAddr by reinterpret_cast is removed
(was added in a previous commit).

using namespace pseudo_inst is also removed as requested.

Comments are added to GuestAddr.

Change-Id: Ib76de2bff285f4e53ad03361969c27f7bb2dfe9e
2024-04-01 18:05:56 +00:00
Matthew Poremba
78cf39bf63 arch-vega: Operand selectors for accumulation registers (#955)
AMD's MI100 introduced a new register file called accumulation registers
for the matrix cores. In MI200 these were recombined into the same
register file according to the documentation. The accumulation register
file is the same size as the architectural register file, hence the size
is doubled.

The ISA spec does not explicitly state the register selector values,
however it does say that the accumulation offset from the kernel
dispatch packet should be added to the architecture register file
selector number when an instruction sets the ACC bit. Therefore we can
infer that the value must simply be an extension beyond the
architectural VGPRs.

This fixes errors of the form "invalid register selector: 512" (or
higher value). This was tested with the Learn the Basics tutorial
example on pytorch.org

Change-Id: I48ced1532fc166d2f5032fe21fbeba70ac77f258
2024-04-01 08:45:37 -07:00
Nicholas Mosier
00d4b6825c sim-se: Implement statx system call for Linux x86-64 (#887)
Implement the statx Linux-specific system call for x86-64. statx is used
by LLVM's libc.

Change-Id: Ic000a36a5e5c1399996f520fa357b9354c73c864
2024-04-01 08:23:39 -07:00
Debjyoti Bhattacharjee
ec690de0da arch-riscv: This commit fixes bug in vfmv.f.s impl. in riscv (#863)
The existing implementation of vfmv instruction did not type cast the
first element of the source vector, which caused the "freg" to interpret
the result as a NaN.

With the type cast to f32, the value is correctly recognized as float
and sign extended to be stored in the fd register.

Git issue: https://github.com/gem5/gem5/issues/827

Change-Id: Ibe9873910827594c0ec11cb51ac0438428c3b54e

---------

Co-authored-by: Debjyoti B <bhatta53@imec.be>
Co-authored-by: Tommaso Marinelli <tommarin@ucm.es>
2024-03-29 08:23:14 -07:00
Harshil Patel
9207458fd7 stdlib: add socks proxy to atlas client (#864) 2024-03-28 14:30:02 -07:00
Bobby R. Bruce
55c58da504 base: Convert doGzipLoad to use std::string instead of *char
Change-Id: I28c9bf7853267686402b43be00f857914770f7a7
2024-03-28 14:23:13 -07:00
Giacomo Travaglini
63706f04b5 dev: Remove duplicate virtio files (#976)
Remove the following files:
* src/dev/virtio/rng 2.cc
* src/dev/virtio/rng 2.hh

Which were a copy of rng.hh and rng.cc. Probably added to the repository
by accident. They were not compiled by scons


Change-Id: I9d1da19cc243c513ab7af887b1b6260d8e361b57

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-28 14:32:11 +00:00
Yu-Cheng Chang
896c32cd0d arch: Add getIsaName in BaseISA (#975)
Change-Id: I81bfcd691d570430f7011f0d5023e5ea613e0dd9
2024-03-28 13:27:32 +00:00
Giacomo Travaglini
9ab97c8930 mem-cache: Move partitioningPolicies to the PartitionManager
Change-Id: I13b41e918ed3864e1a52940786b3eec063253e1d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-26 12:12:24 +00:00
Giacomo Travaglini
d0539fe7cb mem-cache: Define a PartitionManager to handle partitioning
This is a first step towards offloading some of the partitioning
logic to the partitioning manager. We start with this patch
by replacing the static readPacketPartitionId into a virtual
method owned by the manager.

The issue with readPacketPartitionId as of now is that it relies
on the fixed PartitionFieldExtention.
We want user to be able to extract the partitionId more flexibly
and this requires using a SimObject

Change-Id: I3bd2e81e2a97c55fc83548956fc59f422c8049a6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-26 12:12:15 +00:00
Carson Molder
dd5a30d41e sim-se,cpu-kvm: Fix SE workload setup on KVM CPUs (#956)
This PR fixes #948 in which running KVM CPUs through the updated gem5
interface in SE mode causes an immediate crash.

To fix this, I added a check to set_se_binary_workload that checks if
any of the cores are KVM, and if so, sets a couple of knobs for the
board and process that are required to make KVM work. The depecated
se.py script, which sets these knobs, is able to run KVM in SE mode just
fine, so doing the same here fixed the bug.
2024-03-23 15:15:11 -07:00
Bobby R. Bruce
8249fa8dee base: Fix 'doGzipLoad' str manipulation
When running `scons build/ALL/gem5.opt --with-ubsan`, with GCC, the
following error was returned:

```
[     CXX] src/base/loader/image_file_data.cc -> ALL/base/loader/image_file_data.o
In file included from /usr/include/string.h:535,
                 from /usr/include/c++/11/cstring:42,
                 from src/base/cprintf_formats.hh:33,
                 from src/base/cprintf.hh:38,
                 from src/base/logging.hh:49,
                 from src/base/loader/image_file_data.cc:40:
In function ‘char* strcpy(char*, const char*)’,
    inlined from ‘int gem5::loader::doGzipLoad(int)’ at src/base/loader/image_file_data.cc:70:11,
    inlined from ‘gem5::loader::ImageFileData::ImageFileData(const string&)’ atsrc/base/loader/image_file_data.cc:116:24:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:79:33: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ offset [0, 19] is out of the bounds [0, 0] [-Werror=array-bounds]
   79 |   return __builtin___strcpy_chk (__dest, __src, __glibc_objsize (__dest));
      |          ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
scons: *** [build/ALL/base/loader/image_file_data.o] Error 1
scons: building terminated because of errors.
```

I do not know the exact issue but using strcpy in this way (i.e.
`strcpy(char_pointer + offset, string)`) appears to trigger this error
with the undefined behavior sanitizer. The fix in this patch replaces
this with `strcat`.

Change-Id: I1a0c50c9022adc841e175aad0fe2247bfcb29d71
2024-03-23 15:07:26 -07:00
Ivan Fernandez
1e743fd85a arch-riscv: adding vector unit-stride segment stores to RISC-V (#913)
This commit adds support for vector unit-stride segment store operations
for RISC-V (vssegXeXX). This implementation is based in two types of
microops:
- VsSegIntrlv microops that properly interleave source registers into
structs.
- VsSeg microops that store data in memory as contiguous structs of
several fields.

Change-Id: Id80dd4e781743a60eb76c18b6a28061f8e9f723d

Gem5 issue: https://github.com/gem5/gem5/issues/382
2024-03-22 15:45:58 -07:00
Matthew Poremba
7d62da6d10 dev-amdgpu: Support for ROCm 6.0 (#926)
Implement several features new in ROCm 6.0 and features required for
future devices. Includes the following:

- Support for multiple command processors
- Improve handling of unknown register addresses
- Use AddrRange for MMIO address regions
- Handle GART writes through SDMA copy
- Implement PCIe indirect reads and writes
- Improve PM4 write to check dword count
- Implement common MI300X instruction
2024-03-21 21:12:09 -07:00
Matthew Poremba
dca040983b arch-vega: Various vega fixes to enable nanogpt (#950)
This PR fixes some issues observed that were needed to get nanogpt
working.
2024-03-21 21:11:44 -07:00
Michael Boyer
803dbbfdac arch-vega: Implement flat_load_sbyte instruction (#953)
Change-Id: I642a71c504e2d3afecd5d2dfd9db016945aed21b
2024-03-21 21:11:10 -07:00
Matthew Poremba
823b5a6eb8 dev-amdgpu: Support multiple CPs and MMIO AddrRanges
Currently gem5 assumes that there is only one command processor (CP)
which contains the PM4 packet processor. Some GPU devices have multiple
CPs which the driver tests individually during POST if they are used or
not. Therefore, these additional CPs need to be supported.

This commit allows for multiple PM4 packet processors which represent
multiple CPs. Each of these processors will have its own independent
MMIO address range. To more easily support ranges, the MMIO addresses
now use AddrRange to index a PM4 packet processor instead of the
hard-coded constexpr MMIO start and size pairs.

By default only one PM4 packet processor is created, meaning the
functionality of the simulation is unchanged for devices currently
supported in gem5.

Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c
2024-03-21 10:13:55 -05:00
Matthew Poremba
39153cd234 dev-amdgpu: Implement PCIe indirect read/write
PCIe can read/write to any 32-bit address using the PCI index/index2
registers as an address and then reading/writing the corresponding
data/data2 register.

This commit adds this functionality and removes one magic value being
written to support GPU POST. This feature is disabled for Vega10 which
relies on an MMIO trace for too many values to implement in the MMIO
interface.

Change-Id: Iacfdd1294a7652fc3e60304b57df536d318c847b
2024-03-21 10:13:55 -05:00
Matthew Poremba
047c194780 dev-amdgpu: Implement SRBM write
The SRBM write packets where previously not required. This commit
implements SRBM writes to set a register by using the new setRegVal
interface. SRBM writes seem to be used for SRIOV enabled devices.

Change-Id: I202653d339e882e8de59d69a995f65332b2dfb8c
2024-03-21 10:10:01 -05:00
Matthew Poremba
6bbde8fbb8 dev-amdgpu: Rework handling of unknown registers
The top level AMDGPUDevice currently reads/writes all unknown registers
to/from a map containing the previously written value. This is intended
as a way to handle registers that are not part of the model but the
driver requires for functionality. Since this is at the top level, it
can mask changes to register values which do not go through the same
interface. For example, reading an MMIO, changing via PM4 queue, and
reading again returns the stale cached value.

This commit removes the usage of the regs map in AMDGPUDevice,
implements some important MMIOs that were previously handled by it, and
moves the unknown register handling to the NBIO aperture only. To reduce
the number of additional MMIOs to implement, the display manager in
vega10 is now disabled.

Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2
2024-03-21 10:10:01 -05:00
Matthew Poremba
009cec56e0 dev-amdgpu: Check for SDMA copies to GART range
The SDMA engine can potentially be used to write to the GART address
range. Since gem5 has a shadow copy of the GART table to avoid sending
functional reads to device memory, the GART table must be updated when
copying to the GART range.

This changeset adds a check in the VM for GART range and implements the
SDMA copy packet writing to the GART range. A fatal is added to write
and ptePde, which are the only other two ways to write to memory, as
using these packets to update the GART table has not been observed.

Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd
2024-03-21 10:10:01 -05:00
Matthew Poremba
998709d4fc dev-amdgpu: Improve PM4 write data packet
The write data packet can write multiple dwords but currently always
assumes there is one dword, which can cause some write data to be
missed. This case is not common, but the number of dwords is implicitly
defined in the PM4 header.

This changeset passes the PM4 header to write data so that the correct
number of dwords can be determined. For now we assume no page crossing
when writing multiple dwords as the driver should be checking for that.

Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7
2024-03-21 10:10:01 -05:00
Matthew Poremba
c045c68540 dev-amdgpu: Add node_id to interrupt handler
The ROCm 6.0 driver adds a node_id field to interrupts which must match
before passing on the interrupt to be cleared by the cookie from gem5's
interrupt handler implementation. Add this field and enable for gfx942.

The usage of the field can be seen in event_interrupt_isr_v9_4_3 at
https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/
    gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449

Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922
2024-03-21 10:10:01 -05:00
Matthew Poremba
9ab004cccc arch-vega: Implement V_LSHL_ADD_U64
This is a new instruction in MI300 and operates similar to
V_LSHL_ADD_U32 but on 64-bit values.

Change-Id: Ia4ac65160bdad748fccdcb28286ba03157cc4046
2024-03-21 10:10:01 -05:00
Matthew Poremba
f36be791aa arch-vega: Expand FLAT subDecode range in main decoder
The main decoder for GPU instructions looks at the first 9 bits of a
dword to determine either the instruction or a subDecode table with more
information for specific instructions types. For flat instructions the
first 9 bits currently consist of 6 fixed encoding bits, a reserved bit,
and the first two bits of the opcode. Hence to support all opcodes there
are four indirections to the flat subDecode table. In MI300 the reserved
bit is part of a field to determine memory scope and therefore may be
non-zero.

This commit adds four addition calls to the subDecode table for the
cases where the scope bit is 1. See page 468 (PDF page 478) below:

https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/
    instruction-set-architectures/
    amd-instinct-mi300-cdna3-instruction-set-architecture.pdf

Change-Id: Ic3c786f0ca00a758cbe87f42c5e3470576f73a32
2024-03-21 10:10:01 -05:00
Michael Boyer
acd9d3ff94 gpu-compute: Add support for skipping GPU kernels (#940)
gpu-compute: Add support for skipping GPU kernels

This commit adds two new command-line options:

--skip-until-gpu-kernel N
Skips (non-blit) GPU kernels until the target kernel is reached.
Execution continues normally from there. Blit kernels are not skipped
because they are responsible for copying the kernel code and metadata
for the non-blit kernels. Note that skipping kernels can impact
correctness; this feature is only useful if the kernel of interest has
no data-dependent behavior, or its data-dependent behavior is not based
on data generated by the skipped kernels.

--exit-after-gpu-kernel N
Ends the simulation after completing (non-blit) GPU kernel N.

This commit also renames two existing command-line options:
--debug-at-gpu-kernel -> --debug-at-gpu-task
--exit-at-gpu-kernel  -> --exit-at-gpu-task

These were renamed because they count GPU tasks, which include both
kernels launched by the application as well as blit kernels.

Change-Id: If250b3fd2db05c1222e369e9e3f779c4422074bc
2024-03-21 07:46:27 -07:00
Matthew Poremba
e02f329d5d arch-vega: Fix VOP3 decode table off-by-one
There is no VOP3 opcode 667. Mark that invalid and move the opcodes
after down by one.

Change-Id: Ia4ccda91f6f501c1ce7c5898d7d0e924604a459a
2024-03-20 16:41:31 -05:00