Commit Graph

21852 Commits

Author SHA1 Message Date
Mahyar Samani
7ff1e381c9 cpu,stdlib: Fix Access Trace for Accessing Indices in SpatterGen (#1258)
This change fixes the way indices are generated in a multi generator
setup.
It changes it from all cores generating the same trace of indices for
accessing the index array to each core generating an interleaved subset
of indices.
For an example look below for traces (indices to index array) in a 2
core setup.

Before:
core_0: 0, 1, 2, 3, 4, 5, 6, 7, ...
core_1: 0, 1, 2, 3, 4, 5, 6, 7, ...
After:
core_0: 0, 1, 2, 3, 8, 9, 10, 11, ...
core_1: 4, 5, 6, 7, 12, 13, 14, 15, ...

Additionally, this change fixes the SpatterKernel class in the standard
library to comply with the change in the SpatterGen source code.
2024-06-20 11:24:44 -07:00
Matthew Poremba
ed860dfe54 configs: Check before use replacement policy options (#1261)
Rather than adding the options to *every* config that might be using
GPU_VIPER.py, just change the Ruby config to check if the option is
available before trying to use it. Otherwise, reverts to what was the
default on stable.

Change-Id: Ia6f1d0827d489ee2a35c598b644461cbff59e247
2024-06-20 09:50:29 -07:00
TiredTumblrina
9fb0b18863 gpu-compute,mem,systemc: This commit corrects typos of 'cache' (#1263)
I noticed while using the stable branch that there were a few typos of
the word 'cache' and so I've corrected a few files where I found such
typos.

Change-Id: I7c7f64812039f34fe39d0c45c4f5ce921cba06d0
2024-06-20 09:45:13 -07:00
Jason Lowe-Power
943daeb603 stdlib: Add function to append kernel args (#1262)
Often, you want to add another argument to the default kernel arguments.
This function allows you to do that on the `kernel_disk_workload` board
mixin.
2024-06-20 09:14:55 -07:00
Bobby R. Bruce
25d614e4ce tests: Fix x86_boot_exit_run.py 'set_max_ticks' typo (#1267) 2024-06-20 00:31:23 -07:00
Ivana Mitrovic
e88f0944e3 util: Bump urllib3 in gem5-resource-manager (#1257)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.7 to 2.2.2.

Change-Id: I218236ff9ebe99839e417b67e740e6f98c0ee473
2024-06-18 11:05:13 -07:00
Bobby R. Bruce
9fe2bc9edc util-docker: Update devcontainer to use Ubuntu 24.04 (#1256)
Change-Id: I0e0dbaca2194c7f0ff5de54a49888da1c938c2de
2024-06-18 09:35:18 -07:00
Bobby R. Bruce
1a00ecfaf9 stdlib,configs,tests: Add gem5 MultiSim (MultiProcessing for gem5) (#1167)
This allows for multiple gem5 simulations to be spawned from a single
parent gem5 process, as defined in a simgle gem5 configuration. In this
design _all_ the `Simulator`s are defined in the simulation script and
then added to the mutlisim module. For example:

```py
from gem5.simulate.Simulator import Simulator
import gem5.utils.multisim as multisim

# Construct the board[0] and board[1] as you wish here...

simulator1 = Simulator(board=board[0], id="board-1")
simulator2 = Simulator(board=board[1], id="board-2")

multisim.add_simulator(simulator1)
multisim.add_simulator(simulator2)
```

This specifies that two simulations are to be run in parallel in
seperate threads: one specified by `simulator1` and another by
`simulator2`. They are then added to MultiSim via the
`multisim.add_simulator` function. The user can specify an id via the
Simulator constructor. This is used to give each process a unique id and
output directory name. Given this, the id should be a helpful name
describing the simulation being specified. If not specified one is
automatically given.

To run these simulators we use `<gem5 binary> -m gem5.utils.multisim
<script> -p <num_processes>`. Note: multisim is an executable module in
gem5. This is the same module we input into our scripts to add the
simulators. This is an intentionally modular encapsulated design. When
the module processes a script it will schedule multiple gem5 jobs and,
dependent on the number of processes specified, will create child gem5
processes to processes tjese jobs (jobs are just gem5 simulations in
this case). The `--processes` (`-p`) argument is optional and if not
specified the max number of processes which can be run concurrently will
be the number of available threads on the host system.

The id for each process is used to create a subdirectory inside the
`outputdor` (`m5out`) of that id name. E.g, in the example above the
ID's are `board-1` and `board-2`. Therefore the m5 out directory will
look as follows:

```sh
- m5out
    - board-1
        - stats.txt
        - config.ini
        - config.json
        - terminal.out
    - board-2
        - stats.txt
        - config.ini
        - config.json
        - terminal.out
```

Each simulations output is encapsulated inside the subdirectory of the
id name.

If the multisim configuation script is passed directly to gem5 (like a
traditional gem5 configuraiton script, i.e.: `<gem5 binary> <script>`),
the user may run a single simulation specified in that script by passing
its id as an argument. E.g. `<gem5 binary> <script> board-1` will run
the `board-1` simulation specified in `script`. If no argument is passed
an Exception is raised asking the user to either specify or use the
MultiSim module if multiprocessing is needed.

If the user desires a list of ids of the simulations specified in a
given MultiSim script, they can do so by passing the `--list` (`-l`)
parameter to the config script. I.e., `<gem5 binary> <script> --list`
will list all the IDs for all the simulations specified in`script`.

This change comes with two new example scripts found in
'configs/example/gem5_library/multsim" to demonstrate multisim in both
an SE and FS mode simulation. Tests have been added which run these
scripts as part of gem5' Daily suite of tests.

Notes
=====

* **Bug fixed**: The `NoCache` classic cache hierarchy has been modified
so the Xbar is no longet set with a `__func__` call. This interfered
with MultiProcessing as this structure is not serializable via Pickle.
This was quite bad design anyway so should be changed

* **Change**: `readfile_contents` parameter previously wrote its value
to a file called "readfile" in the output dorectory. This has been
changed to write to a file called "readfile_{hash}" with "{hash}" being
a hash of the `readfile_contents`. This ensures that, during multisim
running, this file is not overwritten by other processes.

* **Removal note**: This implementation supercedes the functionality
outlined in 'src/python/gem5/utils/multiprocessing'. As such, this code
has been removed.

Limitations/Things to Fix/Improve
=================================

* Though each Simulator process has its own output directory (a
subdirectory within m5out, with an ID set by the user unique to that
Simulator), the stdout and stderr are still output to the terminal, not
the output directory. This results in: 1. stdout and stderr data lost
and not recorded for these runs. 2. An incredibly noisy terminal output.
* Each process uses the same cached resources. While there are locks on
resources when downloading, each processes will hash the resources they
require to ensure they are valid. This is very inefficient in cases
where resources are common between processes (e.g., you may have 10
processes each using the same disk image with each processes hashing the
disk images independently to give the same result to validate the
resources).

Change-Id: Ief5a3b765070c622d1f0de53ebd545c85a3f0eee

---------

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
2024-06-18 09:34:39 -07:00
Bobby R. Bruce
3138c8a8b1 gpu-compute,mem-ruby: Revert "Add RubyHitMiss flag for TCP and TCC cache" (#1254)
Reverts gem5/gem5#1226
2024-06-18 07:58:54 -07:00
Bobby R. Bruce
36f73f671d cpu,stdlib: Adding Spatter (#1136)
This PR adds source code for C++ implementation of SpatterGen as well as
SpatterKernel. SpatterGen uses a PyBindMethod to add kernels to the
backend code. This way the process of processing json files could be
offloaded to python. In addition it adds standard library components for
SpatterGenCore and SpatterGen. These two components follow the same
structure as AbstractCore and AbstractProcessor. In addition
spatter_kernel.py adds a definition for SpatterKernel in python to make
adding kernels to C++ easier. Also it adds utility functions for parsing
dictionaries read from json as well as partitioning traces for multicore
setups.
2024-06-17 15:28:45 -07:00
Hoa Nguyen
15e0236a8b arch,cpu,sim: Add mechanism to partially print vector regs (#1234)
Currently, gem5's inst tracer prints the whole vector register container
by default. The size of vector register containers in gem5 is the
maximum size allowed by the ISA. For vector-length agnostic (VLA) vector
registers, this means ARM SVE vector container is 2048 bits long, and
RISC-V vector container is 65535 bits long. Note that VLA implementation
in gem5 allows the vector length to be varied within the limit specified
by the ISAs.

However, in most use cases of gem5, the vector length is much less than
65535 bits. This causes two issues: (1) the vector container requires
allocating and moving around a large amount of unused data while only a
fraction of it is used, and (2) printing the execution trace of a vector
register results in a wall of text with a small amount of useful data.

This change addresses the problem (2) by providing a mechanism to limit
the amount data printed by the instruction tracer. This is done by
adding a function printing the first X bits of a vector register
container, where X is the vector length determined at runtime, as
opposed to the vector container size, which is determined at compilation
time.

Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7

---------

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 14:05:47 -07:00
hahaxxz
fef6a97f93 mem-ruby: This commit fixes MI_example protocol (#1236)
fix two bugs in MI_example-dir.sm:
1. Directory cannot handle DMA_READ & DMA_WRITE events in M_DRDI state.
2. Directory cannot handle PUTX_NotOwner events in {M_DWR, M_DRD,
M_DRDI, M_DWRI} state.

Github Issue: https://github.com/gem5/gem5/issues/1210

Change-Id: I52a9d674ce0688dcfbbcc2b583f17de95afdeb87
2024-06-17 12:45:11 -07:00
Hoa Nguyen
500da4306b arch: Mark FailUnimplemented instructions as Invalid instructions (#1247)
This is a follow-up on the discussion here [1].

The IsInvalid flag was previously defined as an instruction that does
not appear in the ISA. However, a micro-architecture can choose to not
recognize an instruction in and raise illegal instruction fault even if
the instruction is in the ISA.

This change modifies the definition of a Invalid instruction such that,
if a StaticInst instruction is marked as IsInvalid, it means the
instruction is not recognized by the decoder. This means that any
instruction recognized by the decoder are not invalid, even if the
instruction is not in the official ISA spec; e.g., m5
pseudo-instructions.

Note that instructions that are recognized by the decoder but are chosen
to act as a nop are not invalid. This applies to WarnUnimplemented
instructions, e.g. hint instructions.

[1] https://github.com/gem5/gem5/pull/1071

Change-Id: I1371b222d8b06793d47f434d0f148c5571672068

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 12:44:05 -07:00
Giacomo Travaglini
2804311f7b cpu-o3: Revert "Do not set Executed on load instruction to be replayed" (#1251)
Reverts gem5/gem5#1182

This is breaking O3 execution. Investigating the matter
2024-06-17 12:24:43 -07:00
Matt Sinclair
6776bebbf6 gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache (#1226)
Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag

Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1
2024-06-17 12:47:47 -05:00
Matthew Poremba
50e4209a4a arch-vega: Various MI300 fixes for PyTorch tests (#1249)
- Fix address calculation issue with scratch_* instructions when SVE bit
is 0.
- Fix ds_swizzle_b32 not mapping to execution unit.
- Implement VOP3 V_FMAC_B32.
- Fix architected scratch address register being clobbered.

Tested with MNIST from PyTorch quickstart tutorial and nanoGPT on
mi300.py.
2024-06-17 07:59:47 -07:00
Jarvis Jia
3a2bf47d57 Add default value and change Ruby address format specifier
Change-Id: I8fbaf34745e90589e610d3b9bd423937e7ebdc3d
2024-06-17 03:27:25 -05:00
Jarvis Jia
edb2e76077 Merge branch 'develop' into rubyhitmiss 2024-06-17 15:57:50 +08:00
Matthew Poremba
2b0ca93517 gpu-compute: Fix architected flat scratch
Currently writing to SRF which is incorrect, as the physical register
number can be clobbered by another wavefront if registers get renamed to
the physical register number.

Fix this by actually architecting the register, i.e., there is a
dedicated "hardware" register in the wavefront class.

Change-Id: I94e9e463eed348b2928cae884c1c20566c00984d
2024-06-15 15:46:33 -07:00
Matthew Poremba
2f5842d253 arch-vega: Add valid flag to ds_swizzle_b32
Currently the flag is just Load and there is a long comment explaining
why. This does not meet any of the scoreboard check requirements:

https://github.com/gem5/gem5/blob/develop/src/gpu-compute/scoreboard_check_stage.cc#L230-L241

Add a generic ALU flag as well so the instruction executes instead of
panicking.

Change-Id: I54b2d20d47fad5e8f05f927328433aab7db7d862
2024-06-15 14:28:59 -07:00
Matthew Poremba
42369eab2c arch-vega: Implement MI300 FLAT SVE bit
For scratch instructions only, this bit specifies if an offset in a VGPR
should be used for address calculation. This is new in MI300 and was
previously the LDS bit. The LDS bit is rarely used and in fact gem5 does
not even check this bit.

This fixes a bug when SADDR == 0x7f (i.e., no SGPR should be used) where
a VGPR was being added to the address when it should have been ignored.

Change-Id: I9864379692df6795b25b58b98825da05d18fc5db
2024-06-15 14:28:59 -07:00
Matthew Poremba
1dab4be002 arch-vega: Implement VOP3 V_FMAC_F32
A version of V_FMAC_F32 with extra modifiers from VOP3 format.

Change-Id: Ib6b41b0a3ceb91269b91a0287dfc94bc73e4d217
2024-06-15 14:28:58 -07:00
Matthew Poremba
f91d14fe46 gpu-compute: Add MFMA stats (#1248)
Add dynamic instruction counts for MFMAs.

Change-Id: I976b01344577cf011aeb3dd648a8c0017281c4e3
2024-06-15 13:04:00 -07:00
Mahyar Samani
d661023de4 stdlib: Adding SpatterGenCore and SpatterGen
This change adds code for SpatterGenCore and SpatterGen as well
as SpatterKernel to the standard library. SpatterGenCore and
SpatterGen follow the same structure as AbstractCore and
AbstractProcessor. spatter_kernel.py adds utility functions
to parse dictionaries as well as partition a list into
multiple lists through interleaving to be used when setting up
a multicore SpatterGen.

Change-Id: I003553e97f901c0724f5feac0bb6e21a020bd6ad
2024-06-14 13:44:34 -07:00
Mahyar Samani
6695e5ef70 cpu: Adding SpatterGen
This change adds source code for SpatterGen ClockedObject.
The set of source code pushed includes code for SpatterKernel
that tracks whether information is being gathered or scattered
as well as the list of indices to be accessed. This model
has PyBindMethod to add SpatterKernels from python.
This way all the preparations for kernels can be done in python.
SpatterGen has a few parameters that model limits on a few of
hardware resources in the backend of a processor, e.g. number
of functional units to calculate effective address, the latency
of calculating effective address, number of integer registers.

Change-Id: I451ffb385180a914e884cab220928c5f1944b2e3
2024-06-14 10:45:09 -07:00
Minje Jun
b8e21a2d32 cpu-o3: Do not set Executed on load instruction to be replayed (#1182)
A load instruction can be replayed when
1) it's strictly ordered or
2) it falls into load-store forwarding mismatch.

Case 1 was considered in executeLoad function but the case 2 wasn't. It
causes the case-2 replayed load instruction to violate the assertion
condition "assert(!load_inst->isExecuted())" in LSQUnit::read. This
commit fixes the problem by adding consideration of the case 2 in
LSQUnit::executeLoad.

Co-authored-by: Minje Jun <minje.jun@samsung.com>
2024-06-14 10:12:26 -07:00
Matthew Poremba
3cf638e217 gpu-compute, util-m5: add GPU kernel exit events (#1217)
The GPUFS scripts include support for dumping and resetting
stats at kernel boundaries by identifying specific GPU kernel 
exit events. This commit extends that support to work with 
GPU SE-mode support.

Change-Id: I662233ae71e2987d90af1fd0100e29036b2ef1c6
2024-06-14 08:13:27 -07:00
Jason Lowe-Power
21ffd91529 cpu,arch: Add IsInvalid flag to Unknown insts (#1071)
The IsInvalid flag indicates that the static instruction is not part of
the executing ISA and not part of m5's pseudo-instructions. This flag
provides a way to recognize an illegal instruction at the decode stage.
2024-06-13 16:26:35 -07:00
Matthew Poremba
b3d9dc42d4 configs: Add replacement policy options for GPUFS (#1230)
GPU_VIPER.py was modified to use these options but they did not exist,
breaking GPUFS. This commit adds them to fix the issue.

Change-Id: I0095f400ea606c4e8d91a41870ef208465cef803
2024-06-13 11:23:50 -07:00
Jarvis Jia
87c0d7732c Merge branch 'develop' into rubyhitmiss 2024-06-12 17:30:35 -04:00
Jarvis Jia
edfc139c40 Change black format
Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296

gpu-compute: add GPU RubyHitMiss for TCP and TCC

Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1

gpu-compute: Add RubyHitMiss flag for TCP and TCC cache

Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5

gpu-compute: Add RubyHitMiss flag for TCP and TCC cache

Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5

Remove space

Change-Id: I401f528c6f128ba0956bdbc232e8f2ae37bf648c
2024-06-12 16:04:36 -05:00
Jarvis Jia
b6b2e8c6c5 Black format
Change-Id: If224c106262bae25127675160ea78386eedace3b
2024-06-12 15:57:04 -05:00
Jarvis Jia
0ebcddea95 Update apu_se.py to remove part not needed
Change-Id: I06df4e0a67ccd2b7a45296ff65bf26c2b465a934
2024-06-12 15:54:13 -05:00
Matthew Poremba
be0a7937c1 mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests (#1216)
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.
2024-06-12 10:10:32 -07:00
Harshil Patel
74afea471d cpu: Revert "Don't change to suspend if the thread status is halted" (#1225)
Reverts gem5/gem5#1039
2024-06-12 00:20:06 -07:00
Bobby R. Bruce
f9abf6bb08 stdlib: Improve gem5 PyStats (#996)
This PR incorporates numerous improvements and fixes to the gem5
PyStats. This includes:

* PyStats now support SimObject Vectors. The PyStats representing them
are subscribable and therefore acceptable by accessing an index: e.g.,:
`simobjectvec[0]`. (This replaces the `Vector` group PyStat)
* Adds the `SparseHist` PyStats.
* Adds the `Vector2d` to PyStats.
* The `Distribution` PyStats is fixed to be a vector of Scalars.
* Tests added for the PyStat's Vector and bugs fixed.
2024-06-12 00:19:08 -07:00
Bobby R. Bruce
e03a5f78d1 misc,tests: Revert merge version to 'v4' from 'v4.0.0'
'v4.0.0' wasn't working. The following error was occurred:

```
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' for action 'actions/upload-artifact/merge@v4.0.0'.
```

Change-Id: I658b0fe292df029501fbc1286acb06f4014ae4e1
2024-06-12 00:15:06 -07:00
Bobby R. Bruce
261490f23c misc,tests: Revert merge version to 'v4' from 'v4.0.0'
'v4.0.0' wasn't working. The following error was occurred:

```
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' for action 'actions/upload-artifact/merge@v4.0.0'.
```

Change-Id: I658b0fe292df029501fbc1286acb06f4014ae4e1
2024-06-12 00:14:27 -07:00
Vishnu Ramadas
42b9a9666e mem-ruby: Add instSeqNum to atomic responses from GPU L2 caches
This commit adds instSeqNum to the atomic responses in
GPU_VIPER-TCC.sm. This will be useful when debugging issues related to
GPU atomic transactions

Change-Id: Ic05c8e1a1cb230abfca2759b51e5603304aadaa3
2024-06-11 20:35:43 -05:00
Vishnu Ramadas
943d1f1453 mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.

Change-Id: I8200ce6e77da7c8b4db285c0cc8b8ca0dfa7d720
2024-06-11 20:33:46 -05:00
Bobby R. Bruce
7e45ec0ff0 stdlib: Fix m5.ext.pystats __init__.py
Addresses Jason's complaint that wildcare imports should be avoided, in
accordance with PEP008:
https://github.com/gem5/gem5/pull/996#discussion_r1621051601.

Change-Id: I72266df43d3ec4ede3f45c3e34e2e05e1990bd6b
2024-06-11 16:26:24 -07:00
Bobby R. Bruce
26a1d2ff0b misc,tests: Update daily test artifact actions to v4.0.0
Change-Id: I711fa36639e925ce958e0484a31ee6a4dde87dbe
2024-06-11 15:44:07 -07:00
Bobby R. Bruce
8fc4d3f793 misc,tests: Update daily test artifact actions to v4.0.0
Change-Id: I711fa36639e925ce958e0484a31ee6a4dde87dbe
2024-06-11 15:43:40 -07:00
Matt Sinclair
8a44e97a10 gpu-compute: Added functions to choose replacement policies for GPU (#1213)
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem
replacement
policies  for TCC, TCP and SQC caches for GPU
2024-06-11 15:08:42 -05:00
Hoa Nguyen
d528a6bd2d arch: Flag all ISAs Unknown instruction as IsInvalid
Change-Id: I096138a157c4e2063c5f4f4324c21c1463dddb65
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-11 18:48:29 +00:00
Hoa Nguyen
369029d2be cpu: Add IsInvalid flag to StaticInstFlags
The IsInvalid flag indicates that the static instruction is not part
of the executing ISA and not part of m5's pseudo-instructions. This
flag provides a way to recognize an illegal instruction at the decode
stage.

Change-Id: I2779c6edcd8c5e6a77ea11cad3ff73bacb79d800
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-11 18:48:29 +00:00
Harry Chiang
d198380489 base: Fix uninitialized variable warning in symtab.test.cc (#1221)
This warning is appeared when I add warning related flags to LINKFLAGS
and turn on LTO to build unit tests.
2024-06-11 10:53:00 -07:00
Jarvis Jia
4fea51b598 Black format change
Change-Id: I95cbf5b97601ef3b6ca26bc1a1835305929ffcab
2024-06-10 22:52:56 -05:00
Jarvis Jia
8e268d42e2 gpu-compute: Provided m5ops support for gpu
Adding m5 stat dump and reset into python script through different exit
event

Change-Id: I662233ae71e2987d90af1fd0100e29036b2ef1c6
2024-06-10 20:56:08 -05:00
Jarvis Jia
cf5e316a92 Change black format
Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296
2024-06-10 16:33:18 -05:00