Commit Graph

15236 Commits

Author SHA1 Message Date
Yu-Cheng Chang
ce8db85867 cpu: Add cpuIdlePins to indicate the threadContext of CPU is idle (#1285)
If the threacContext of CPU enters the suspend mode, raise the threadID
of threadContext cpu_idle_pins with the high signal to target. If the
threadContext of CPU enters the activate mode, lower the threadID of
thread cpu_idle_pins with low signal to target.
2024-07-10 10:36:37 +01:00
Yu-Cheng Chang
d54dcac393 arch-riscv: Fix setRegs from GDB failed after #1099 (#1291)
The gem5 crashed when user try to update register value from GDB because
PR[1] changes the index of CSR_XSTATUS to MISCREG_XSTATUS, which is out
of NUM_PHYS_MISCREGS.

The CSR_XSTATUS should use setRegWithMask to update it.

[1] : https://github.com/gem5/gem5/pull/1099

gem5 issue: https://github.com/gem5/gem5/issues/1299

Change-Id: Iefc0d1f5adfb98ecfda0e74907964b47d1864b6d
2024-07-09 15:55:35 -07:00
Jason Lowe-Power
d20512c291 arch-riscv: add agnostic option to vector tail/mask policy for mem and arith instructions (#1135)
These two commits add agnostic capability for both tail/mask policies,
for vector memory and arithmetic instructions respectively. The common
policy for instructions is to act as undisturbed if one is (i.e. tail or
mask), or write all 1s if none.

For those instructions in which multiple micro instructions are
instantiated to write to the same register (`VlStride` and `VlIndex` for
memory, and `VectorGather`, `VectorSlideUp` and `VectorSlideDown` for
arithmetic), a (new) micro instruction named `VPinVdCpyVsMicroInst` has
been used to pin the destination register so that there's no need to
copy the partial results between them. This idea is similar to what's on
ARM's SVE code. This micro also implements the tail/mask policy for this
cases.

Finally, it's worth noting that while now using an agnostic policy for
both tail/mask should remove all dependencies with old destination
registers, there's an exception with `VectorSlideUp`. The
`vslideup_{vx,vi}` instructions need the elements in the offset to be
unchanged. The current implementation overrides the current vta/vma and
makes them act as undisturbed, since they require the old destination
register anyways. There's a minor issue with this though, as
`v{,f}slide1up` variants do not need this, but since they share the same
constructor, will act all the same.

Related issue #997.
2024-07-08 11:47:11 -07:00
Robert Hauser
77528d1928 systemc: Use headerDelay in timing annotation (#1328)
1. Responder (downstream components):

    When sending a BEGIN_REQ, the timing annotation marks the time when
    a transaction is visible to the target (see [1] on page 465).

    When writing the data, the downstream component calculates the
    transfer time and would send END_REQ after this time (see [1] on
    page 540). Therefore, not the payloadDelay, but the headerDelay
    should be used, as already written as a comment in the source files.
    When reading data, payloadDelay will be 0 anyway.

2. Requester (upstream component):

    For data read, the begin of the transfer is marked by BEGIN_RESP
    and the upstream component would delay END_RESP to model the
    data transfer (see [1] on page 540). Therefore, BEGIN_RESP should be
    delayed by the headerDelay, not the payloadDelay.

[1] "IEEE Standard for Standard SystemC® Language Reference Manual," in
IEEE Std 1666-2023 (Revision of IEEE Std 1666-2011), vol., no.,
pp.1-618, 8 Sept. 2023, doi: 10.1109/IEEESTD.2023.10246125.

Change-Id: I3b5e8ad6bc37cbb309b124efdc8764fca3728b7a

Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
2024-07-05 09:05:24 -07:00
Giacomo Travaglini
d825103df2 arch-arm: Implement FEAT_TTST (#1323)
Implement small translation table extension.
This feature relaxes the lower limit on the size of the translation
tables, by increasing the maximum permitted values of the T1SZ and T0SZ
field in: TCR_EL1, TCR_EL2, TCR_EL3,VTCR_EL2 and VSTCR_EL2

Change-Id: I4c2187815b2d7f14407edb38095c6bcc2004b62a

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-04 09:37:41 +01:00
Giacomo Travaglini
c9d9108978 arch-arm: MISCREG_AT_S1E2R/W are executable from S state (#1322)
Change-Id: Ieaebdf0d62b5115f8085f478b2da105633b6a26a

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-04 09:37:17 +01:00
Giacomo Travaglini
f3e3c60805 arch-arm: Proper support for NonSecure IPA space in Secure state
Change-Id: Ie2e2278ecdc5213db74999e3561b2918937c2c2e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:16:13 +01:00
Giacomo Travaglini
eb400e773b arch-arm: Remove makeStage2 from TLBIOp
Change-Id: I25276e4b5b7c491e69208044ceb193c67ddfd91c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:15:49 +01:00
Giacomo Travaglini
49ca08b01a arch-arm: Add isStage2 qualifier to the LongDecriptor
We are currently using the LongDecriptor for both stage1
and stage2 translations. There are several cases where
the bitfield meaning changes depending on the translation
stage.

Change-Id: Ic33d9ef225a57fd79ce2b4bf47896aeb6bdd8d9c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:15:31 +01:00
Giacomo Travaglini
9cce68ca71 arch-arm: Replace isSecure boolean with SecurityState enum
Change-Id: If01b8b2811b2c028e669ea3700174c7945b07a06
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 12:45:24 +01:00
Alexander Richardson
d5c0383887 arch-arm: support 64-bit PMCCNTR from AArch32 (#1304)
For ARMv8 CPUs this register allows reading a 64-bit cycle counter in
from 32-bit execution state.

Change-Id: I7cd9e2711ada5156920440cc3c89e7a74ca54a49
2024-07-02 08:59:44 +01:00
Giacomo Travaglini
b28659d4f9 arch-arm: Implement FEAT_XS (#1303)
This patch is adding a functional implementation of FEAT_XS. Unless we
operate with DVM enabled, TLBIs broadcasting is accomplished in 0 time;
so there is no timing benefit introduced by enabling FEAT_XS other than
the way it affects TLB management (invalidation)

Change-Id: I067cb8b7702c59c40c9bbb8da536a0b7f3337b5d

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 08:52:59 +01:00
Matt Sinclair
04a3fd5b5d gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache (#1260)
Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag

Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b
2024-06-30 13:32:01 -05:00
Bobby R. Bruce
b3f23830c9 misc: Update versioning for develop branch
Develop for v24.1

Change-Id: I4ef34c4a4ef67d171505ff9380746ae193655305
2024-06-27 23:36:07 -07:00
Bobby R. Bruce
6fcc13cf55 misc: Merge branch stable into develop
This guarantees all changes put on the staging branch and, for whatever
reason, put on stable are on develop. This syncs the branches.

Change-Id: Ib3513f49977bb4ed3046c2d9d6cf162953b15887
2024-06-27 23:27:21 -07:00
Harshil Patel
3acb6e59cf resources: Update elfie.py to work with obtain_resources (#1289)
Change-Id: I08c5e50a150c8434c6c2ca36af81fb6ec3915af8
2024-06-27 20:02:57 -07:00
Jarvis Jia
f56571fed9 Merge branch 'develop' into rubyhitmiss 2024-06-27 21:45:08 +08:00
Rajesh Shashi Kumar
3ce5e0584a arch-arm: This commit fixes a typo in the ARM ldaddalx instruction (#1279)
The acquire-release flavor of the ldadd instruction should read ldaddalx
(eg. ldaddalb/ldaddalh) according to specification. However, this is
currently noted as ldadd"la"x (eg. ldaddlab/ldaddlah).

Issue: https://github.com/gem5/gem5/issues/1224
Change-Id: Ib932fa0e572207729c923c27f24c34cc21dff0e5

Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
2024-06-26 09:03:50 -07:00
Harshil Patel
e0d03fbc2f resources: fix check for additional_params for workloads
Change-Id: I0a4b5f0eef6e2f9faf35cea8130572a066aab6cd
2024-06-26 07:13:04 -07:00
Harshil Patel
144a2071fe resources: fix check for additional_params for workloads
Change-Id: I0a4b5f0eef6e2f9faf35cea8130572a066aab6cd
2024-06-25 16:30:07 -07:00
Harshil Patel
241b8a09df resources: Update client_query to trim gem5 version (#1284)
- gem5 was querying the full version of gem5 that is `24.0.0.0` while
searching for resources.
This was causing an error to find resources on staging branch. 
This change trims the gem5 version to be just the major.minor version.

Change-Id: I30c3a1b38c631981f797ef0fd2b616e6a66ca18e
2024-06-25 09:04:13 -07:00
Harshil Patel
52fde944a5 resources: Update client_query to trim gem5 version (#1284)
- gem5 was querying the full version of gem5 that is `24.0.0.0` while
searching for resources.
This was causing an error to find resources on staging branch. 
This change trims the gem5 version to be just the major.minor version.

Change-Id: I30c3a1b38c631981f797ef0fd2b616e6a66ca18e
2024-06-25 09:01:36 -07:00
Jarvis Jia
341c72839b Fix hit issue
Change-Id: I28745489de693591d5ad8453b035a8c782adaf1f
2024-06-24 11:19:51 -07:00
Jarvis Jia
21b69975a6 Fix compilation error
Change-Id: I8273472b8d0cff8c02f2d1e1a9d66599af7c4866
2024-06-24 11:19:51 -07:00
Jarvis Jia
e957a882ed gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache
Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag

Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b
2024-06-24 11:19:51 -07:00
Saúl Adserias
99f58d37da arch-riscv: add agnostic opt to vector tail/mask for arith insts
Change-Id: I693b5f3a6cc8a8f320be26b214fd9b359e541f14
2024-06-24 10:03:52 -07:00
Saúl Adserias
73c364519a arch-riscv: add agnostic opt to vector tail/mask for mem insts
Change-Id: I567a110806b77d5576810706bd3e30185b0e0b75
2024-06-24 10:03:52 -07:00
Mahyar Samani
21bd1c28ab Adding an example for Spatter (#1272)
This change adds a new utility function for processing Spatter traces
into SpatterKernels under parse_kernels.
Additionally, it adds documentation for all the utility functions in
spatter_kernel.py.
Lastly, it adds an example script for running one spatter trace using
SpatterGenerator to the examples.
2024-06-21 02:26:58 -07:00
Mahyar Samani
30bfdc8e52 stdlib: Getter method to get monolith range. (#1273)
This change extend the AbstractMemory class to add a getter method that
allows other components to get the memory's range without interleaving.
This method will be useful if other components in the system need to
interleave the memory range different to the way the memory has
interleaved them.
2024-06-21 02:26:50 -07:00
Mahyar Samani
18bc5227f6 stdlib: Getter method to get monolith range. (#1273)
This change extend the AbstractMemory class to add a getter method that
allows other components to get the memory's range without interleaving.
This method will be useful if other components in the system need to
interleave the memory range different to the way the memory has
interleaved them.
2024-06-21 02:23:58 -07:00
Mahyar Samani
590bb1fbbb Adding an example for Spatter (#1272)
This change adds a new utility function for processing Spatter traces
into SpatterKernels under parse_kernels.
Additionally, it adds documentation for all the utility functions in
spatter_kernel.py.
Lastly, it adds an example script for running one spatter trace using
SpatterGenerator to the examples.
2024-06-21 02:23:41 -07:00
Bobby R. Bruce
d9d7d7646a misc: Update Doxygen version to v24.0.0.0
Change-Id: Ibaa04b09813a1d497727ed9d2a903ee2b3049ffd
2024-06-20 13:53:20 -07:00
Bobby R. Bruce
888bf0d693 base: Update src/base/version.cc for v24.0
Change-Id: Iac980772a42853f9bfbdadb65d5efc3c5fdb6aed
2024-06-20 13:53:07 -07:00
Jason Lowe-Power
013f773d31 arch-riscv: Fix TLB lookup with vaddrs (#1264)
Previously, all of the TLB lookup/insert functions were using the full
virtual addresses even though the variables in the functions said "vpn."
This change explicitly converts the virtual address to the VPN without
any least significant zeros for the offset. I.e., vpn >> page_size.

The main bug solved in this changeset is the asid was |'d with the upper
bits of the virtual address, but sometimes there were all 1's.
Therefore, you could get a TLB hit even if the ASID was different.
Interestingly, the page that seemed to cause these issues was a 1 GiB
page.

This change also starts refactoring some of the page table details to
support sv46 and sv57 page table formats.

In my testing, the Linux kernel boot uses large pages (even OpenSBI uses
large pages), so it seems that large pages also work. However, this
seems like magic to me, so I'm not sure if it's correct.

This change also updates some asserts, and debug statements with more
useful debugging information.

Partially fixes #1235. More testing needs to be done to be confident.
2024-06-20 13:24:50 -07:00
Bobby R. Bruce
7137b73ca0 cpu: Fix std::min type mismatch in reg_class.hh (#1266)
Introduced in #1234, this caused compilation to faill in Apple Silicon
systems. This bug is the same as #582 where a more detailed explanation
is provided.
2024-06-20 13:02:08 -07:00
Mahyar Samani
7ff1e381c9 cpu,stdlib: Fix Access Trace for Accessing Indices in SpatterGen (#1258)
This change fixes the way indices are generated in a multi generator
setup.
It changes it from all cores generating the same trace of indices for
accessing the index array to each core generating an interleaved subset
of indices.
For an example look below for traces (indices to index array) in a 2
core setup.

Before:
core_0: 0, 1, 2, 3, 4, 5, 6, 7, ...
core_1: 0, 1, 2, 3, 4, 5, 6, 7, ...
After:
core_0: 0, 1, 2, 3, 8, 9, 10, 11, ...
core_1: 4, 5, 6, 7, 12, 13, 14, 15, ...

Additionally, this change fixes the SpatterKernel class in the standard
library to comply with the change in the SpatterGen source code.
2024-06-20 11:24:44 -07:00
TiredTumblrina
9fb0b18863 gpu-compute,mem,systemc: This commit corrects typos of 'cache' (#1263)
I noticed while using the stable branch that there were a few typos of
the word 'cache' and so I've corrected a few files where I found such
typos.

Change-Id: I7c7f64812039f34fe39d0c45c4f5ce921cba06d0
2024-06-20 09:45:13 -07:00
Jason Lowe-Power
943daeb603 stdlib: Add function to append kernel args (#1262)
Often, you want to add another argument to the default kernel arguments.
This function allows you to do that on the `kernel_disk_workload` board
mixin.
2024-06-20 09:14:55 -07:00
Bobby R. Bruce
1a00ecfaf9 stdlib,configs,tests: Add gem5 MultiSim (MultiProcessing for gem5) (#1167)
This allows for multiple gem5 simulations to be spawned from a single
parent gem5 process, as defined in a simgle gem5 configuration. In this
design _all_ the `Simulator`s are defined in the simulation script and
then added to the mutlisim module. For example:

```py
from gem5.simulate.Simulator import Simulator
import gem5.utils.multisim as multisim

# Construct the board[0] and board[1] as you wish here...

simulator1 = Simulator(board=board[0], id="board-1")
simulator2 = Simulator(board=board[1], id="board-2")

multisim.add_simulator(simulator1)
multisim.add_simulator(simulator2)
```

This specifies that two simulations are to be run in parallel in
seperate threads: one specified by `simulator1` and another by
`simulator2`. They are then added to MultiSim via the
`multisim.add_simulator` function. The user can specify an id via the
Simulator constructor. This is used to give each process a unique id and
output directory name. Given this, the id should be a helpful name
describing the simulation being specified. If not specified one is
automatically given.

To run these simulators we use `<gem5 binary> -m gem5.utils.multisim
<script> -p <num_processes>`. Note: multisim is an executable module in
gem5. This is the same module we input into our scripts to add the
simulators. This is an intentionally modular encapsulated design. When
the module processes a script it will schedule multiple gem5 jobs and,
dependent on the number of processes specified, will create child gem5
processes to processes tjese jobs (jobs are just gem5 simulations in
this case). The `--processes` (`-p`) argument is optional and if not
specified the max number of processes which can be run concurrently will
be the number of available threads on the host system.

The id for each process is used to create a subdirectory inside the
`outputdor` (`m5out`) of that id name. E.g, in the example above the
ID's are `board-1` and `board-2`. Therefore the m5 out directory will
look as follows:

```sh
- m5out
    - board-1
        - stats.txt
        - config.ini
        - config.json
        - terminal.out
    - board-2
        - stats.txt
        - config.ini
        - config.json
        - terminal.out
```

Each simulations output is encapsulated inside the subdirectory of the
id name.

If the multisim configuation script is passed directly to gem5 (like a
traditional gem5 configuraiton script, i.e.: `<gem5 binary> <script>`),
the user may run a single simulation specified in that script by passing
its id as an argument. E.g. `<gem5 binary> <script> board-1` will run
the `board-1` simulation specified in `script`. If no argument is passed
an Exception is raised asking the user to either specify or use the
MultiSim module if multiprocessing is needed.

If the user desires a list of ids of the simulations specified in a
given MultiSim script, they can do so by passing the `--list` (`-l`)
parameter to the config script. I.e., `<gem5 binary> <script> --list`
will list all the IDs for all the simulations specified in`script`.

This change comes with two new example scripts found in
'configs/example/gem5_library/multsim" to demonstrate multisim in both
an SE and FS mode simulation. Tests have been added which run these
scripts as part of gem5' Daily suite of tests.

Notes
=====

* **Bug fixed**: The `NoCache` classic cache hierarchy has been modified
so the Xbar is no longet set with a `__func__` call. This interfered
with MultiProcessing as this structure is not serializable via Pickle.
This was quite bad design anyway so should be changed

* **Change**: `readfile_contents` parameter previously wrote its value
to a file called "readfile" in the output dorectory. This has been
changed to write to a file called "readfile_{hash}" with "{hash}" being
a hash of the `readfile_contents`. This ensures that, during multisim
running, this file is not overwritten by other processes.

* **Removal note**: This implementation supercedes the functionality
outlined in 'src/python/gem5/utils/multiprocessing'. As such, this code
has been removed.

Limitations/Things to Fix/Improve
=================================

* Though each Simulator process has its own output directory (a
subdirectory within m5out, with an ID set by the user unique to that
Simulator), the stdout and stderr are still output to the terminal, not
the output directory. This results in: 1. stdout and stderr data lost
and not recorded for these runs. 2. An incredibly noisy terminal output.
* Each process uses the same cached resources. While there are locks on
resources when downloading, each processes will hash the resources they
require to ensure they are valid. This is very inefficient in cases
where resources are common between processes (e.g., you may have 10
processes each using the same disk image with each processes hashing the
disk images independently to give the same result to validate the
resources).

Change-Id: Ief5a3b765070c622d1f0de53ebd545c85a3f0eee

---------

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
2024-06-18 09:34:39 -07:00
Bobby R. Bruce
3138c8a8b1 gpu-compute,mem-ruby: Revert "Add RubyHitMiss flag for TCP and TCC cache" (#1254)
Reverts gem5/gem5#1226
2024-06-18 07:58:54 -07:00
Bobby R. Bruce
36f73f671d cpu,stdlib: Adding Spatter (#1136)
This PR adds source code for C++ implementation of SpatterGen as well as
SpatterKernel. SpatterGen uses a PyBindMethod to add kernels to the
backend code. This way the process of processing json files could be
offloaded to python. In addition it adds standard library components for
SpatterGenCore and SpatterGen. These two components follow the same
structure as AbstractCore and AbstractProcessor. In addition
spatter_kernel.py adds a definition for SpatterKernel in python to make
adding kernels to C++ easier. Also it adds utility functions for parsing
dictionaries read from json as well as partitioning traces for multicore
setups.
2024-06-17 15:28:45 -07:00
Hoa Nguyen
15e0236a8b arch,cpu,sim: Add mechanism to partially print vector regs (#1234)
Currently, gem5's inst tracer prints the whole vector register container
by default. The size of vector register containers in gem5 is the
maximum size allowed by the ISA. For vector-length agnostic (VLA) vector
registers, this means ARM SVE vector container is 2048 bits long, and
RISC-V vector container is 65535 bits long. Note that VLA implementation
in gem5 allows the vector length to be varied within the limit specified
by the ISAs.

However, in most use cases of gem5, the vector length is much less than
65535 bits. This causes two issues: (1) the vector container requires
allocating and moving around a large amount of unused data while only a
fraction of it is used, and (2) printing the execution trace of a vector
register results in a wall of text with a small amount of useful data.

This change addresses the problem (2) by providing a mechanism to limit
the amount data printed by the instruction tracer. This is done by
adding a function printing the first X bits of a vector register
container, where X is the vector length determined at runtime, as
opposed to the vector container size, which is determined at compilation
time.

Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7

---------

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 14:05:47 -07:00
hahaxxz
fef6a97f93 mem-ruby: This commit fixes MI_example protocol (#1236)
fix two bugs in MI_example-dir.sm:
1. Directory cannot handle DMA_READ & DMA_WRITE events in M_DRDI state.
2. Directory cannot handle PUTX_NotOwner events in {M_DWR, M_DRD,
M_DRDI, M_DWRI} state.

Github Issue: https://github.com/gem5/gem5/issues/1210

Change-Id: I52a9d674ce0688dcfbbcc2b583f17de95afdeb87
2024-06-17 12:45:11 -07:00
Hoa Nguyen
500da4306b arch: Mark FailUnimplemented instructions as Invalid instructions (#1247)
This is a follow-up on the discussion here [1].

The IsInvalid flag was previously defined as an instruction that does
not appear in the ISA. However, a micro-architecture can choose to not
recognize an instruction in and raise illegal instruction fault even if
the instruction is in the ISA.

This change modifies the definition of a Invalid instruction such that,
if a StaticInst instruction is marked as IsInvalid, it means the
instruction is not recognized by the decoder. This means that any
instruction recognized by the decoder are not invalid, even if the
instruction is not in the official ISA spec; e.g., m5
pseudo-instructions.

Note that instructions that are recognized by the decoder but are chosen
to act as a nop are not invalid. This applies to WarnUnimplemented
instructions, e.g. hint instructions.

[1] https://github.com/gem5/gem5/pull/1071

Change-Id: I1371b222d8b06793d47f434d0f148c5571672068

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 12:44:05 -07:00
Giacomo Travaglini
2804311f7b cpu-o3: Revert "Do not set Executed on load instruction to be replayed" (#1251)
Reverts gem5/gem5#1182

This is breaking O3 execution. Investigating the matter
2024-06-17 12:24:43 -07:00
Matt Sinclair
6776bebbf6 gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache (#1226)
Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag

Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1
2024-06-17 12:47:47 -05:00
Jarvis Jia
3a2bf47d57 Add default value and change Ruby address format specifier
Change-Id: I8fbaf34745e90589e610d3b9bd423937e7ebdc3d
2024-06-17 03:27:25 -05:00
Jarvis Jia
edb2e76077 Merge branch 'develop' into rubyhitmiss 2024-06-17 15:57:50 +08:00
Matthew Poremba
2b0ca93517 gpu-compute: Fix architected flat scratch
Currently writing to SRF which is incorrect, as the physical register
number can be clobbered by another wavefront if registers get renamed to
the physical register number.

Fix this by actually architecting the register, i.e., there is a
dedicated "hardware" register in the wavefront class.

Change-Id: I94e9e463eed348b2928cae884c1c20566c00984d
2024-06-15 15:46:33 -07:00
Matthew Poremba
2f5842d253 arch-vega: Add valid flag to ds_swizzle_b32
Currently the flag is just Load and there is a long comment explaining
why. This does not meet any of the scoreboard check requirements:

https://github.com/gem5/gem5/blob/develop/src/gpu-compute/scoreboard_check_stage.cc#L230-L241

Add a generic ALU flag as well so the instruction executes instead of
panicking.

Change-Id: I54b2d20d47fad5e8f05f927328433aab7db7d862
2024-06-15 14:28:59 -07:00