Commit Graph

15229 Commits

Author SHA1 Message Date
Giacomo Travaglini
c9d9108978 arch-arm: MISCREG_AT_S1E2R/W are executable from S state (#1322)
Change-Id: Ieaebdf0d62b5115f8085f478b2da105633b6a26a

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-04 09:37:17 +01:00
Giacomo Travaglini
f3e3c60805 arch-arm: Proper support for NonSecure IPA space in Secure state
Change-Id: Ie2e2278ecdc5213db74999e3561b2918937c2c2e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:16:13 +01:00
Giacomo Travaglini
eb400e773b arch-arm: Remove makeStage2 from TLBIOp
Change-Id: I25276e4b5b7c491e69208044ceb193c67ddfd91c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:15:49 +01:00
Giacomo Travaglini
49ca08b01a arch-arm: Add isStage2 qualifier to the LongDecriptor
We are currently using the LongDecriptor for both stage1
and stage2 translations. There are several cases where
the bitfield meaning changes depending on the translation
stage.

Change-Id: Ic33d9ef225a57fd79ce2b4bf47896aeb6bdd8d9c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:15:31 +01:00
Giacomo Travaglini
9cce68ca71 arch-arm: Replace isSecure boolean with SecurityState enum
Change-Id: If01b8b2811b2c028e669ea3700174c7945b07a06
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 12:45:24 +01:00
Alexander Richardson
d5c0383887 arch-arm: support 64-bit PMCCNTR from AArch32 (#1304)
For ARMv8 CPUs this register allows reading a 64-bit cycle counter in
from 32-bit execution state.

Change-Id: I7cd9e2711ada5156920440cc3c89e7a74ca54a49
2024-07-02 08:59:44 +01:00
Giacomo Travaglini
b28659d4f9 arch-arm: Implement FEAT_XS (#1303)
This patch is adding a functional implementation of FEAT_XS. Unless we
operate with DVM enabled, TLBIs broadcasting is accomplished in 0 time;
so there is no timing benefit introduced by enabling FEAT_XS other than
the way it affects TLB management (invalidation)

Change-Id: I067cb8b7702c59c40c9bbb8da536a0b7f3337b5d

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 08:52:59 +01:00
Matt Sinclair
04a3fd5b5d gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache (#1260)
Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag

Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b
2024-06-30 13:32:01 -05:00
Bobby R. Bruce
b3f23830c9 misc: Update versioning for develop branch
Develop for v24.1

Change-Id: I4ef34c4a4ef67d171505ff9380746ae193655305
2024-06-27 23:36:07 -07:00
Bobby R. Bruce
6fcc13cf55 misc: Merge branch stable into develop
This guarantees all changes put on the staging branch and, for whatever
reason, put on stable are on develop. This syncs the branches.

Change-Id: Ib3513f49977bb4ed3046c2d9d6cf162953b15887
2024-06-27 23:27:21 -07:00
Harshil Patel
3acb6e59cf resources: Update elfie.py to work with obtain_resources (#1289)
Change-Id: I08c5e50a150c8434c6c2ca36af81fb6ec3915af8
2024-06-27 20:02:57 -07:00
Jarvis Jia
f56571fed9 Merge branch 'develop' into rubyhitmiss 2024-06-27 21:45:08 +08:00
Rajesh Shashi Kumar
3ce5e0584a arch-arm: This commit fixes a typo in the ARM ldaddalx instruction (#1279)
The acquire-release flavor of the ldadd instruction should read ldaddalx
(eg. ldaddalb/ldaddalh) according to specification. However, this is
currently noted as ldadd"la"x (eg. ldaddlab/ldaddlah).

Issue: https://github.com/gem5/gem5/issues/1224
Change-Id: Ib932fa0e572207729c923c27f24c34cc21dff0e5

Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
2024-06-26 09:03:50 -07:00
Harshil Patel
e0d03fbc2f resources: fix check for additional_params for workloads
Change-Id: I0a4b5f0eef6e2f9faf35cea8130572a066aab6cd
2024-06-26 07:13:04 -07:00
Harshil Patel
144a2071fe resources: fix check for additional_params for workloads
Change-Id: I0a4b5f0eef6e2f9faf35cea8130572a066aab6cd
2024-06-25 16:30:07 -07:00
Harshil Patel
241b8a09df resources: Update client_query to trim gem5 version (#1284)
- gem5 was querying the full version of gem5 that is `24.0.0.0` while
searching for resources.
This was causing an error to find resources on staging branch. 
This change trims the gem5 version to be just the major.minor version.

Change-Id: I30c3a1b38c631981f797ef0fd2b616e6a66ca18e
2024-06-25 09:04:13 -07:00
Harshil Patel
52fde944a5 resources: Update client_query to trim gem5 version (#1284)
- gem5 was querying the full version of gem5 that is `24.0.0.0` while
searching for resources.
This was causing an error to find resources on staging branch. 
This change trims the gem5 version to be just the major.minor version.

Change-Id: I30c3a1b38c631981f797ef0fd2b616e6a66ca18e
2024-06-25 09:01:36 -07:00
Jarvis Jia
341c72839b Fix hit issue
Change-Id: I28745489de693591d5ad8453b035a8c782adaf1f
2024-06-24 11:19:51 -07:00
Jarvis Jia
21b69975a6 Fix compilation error
Change-Id: I8273472b8d0cff8c02f2d1e1a9d66599af7c4866
2024-06-24 11:19:51 -07:00
Jarvis Jia
e957a882ed gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache
Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag

Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b
2024-06-24 11:19:51 -07:00
Mahyar Samani
21bd1c28ab Adding an example for Spatter (#1272)
This change adds a new utility function for processing Spatter traces
into SpatterKernels under parse_kernels.
Additionally, it adds documentation for all the utility functions in
spatter_kernel.py.
Lastly, it adds an example script for running one spatter trace using
SpatterGenerator to the examples.
2024-06-21 02:26:58 -07:00
Mahyar Samani
30bfdc8e52 stdlib: Getter method to get monolith range. (#1273)
This change extend the AbstractMemory class to add a getter method that
allows other components to get the memory's range without interleaving.
This method will be useful if other components in the system need to
interleave the memory range different to the way the memory has
interleaved them.
2024-06-21 02:26:50 -07:00
Mahyar Samani
18bc5227f6 stdlib: Getter method to get monolith range. (#1273)
This change extend the AbstractMemory class to add a getter method that
allows other components to get the memory's range without interleaving.
This method will be useful if other components in the system need to
interleave the memory range different to the way the memory has
interleaved them.
2024-06-21 02:23:58 -07:00
Mahyar Samani
590bb1fbbb Adding an example for Spatter (#1272)
This change adds a new utility function for processing Spatter traces
into SpatterKernels under parse_kernels.
Additionally, it adds documentation for all the utility functions in
spatter_kernel.py.
Lastly, it adds an example script for running one spatter trace using
SpatterGenerator to the examples.
2024-06-21 02:23:41 -07:00
Bobby R. Bruce
d9d7d7646a misc: Update Doxygen version to v24.0.0.0
Change-Id: Ibaa04b09813a1d497727ed9d2a903ee2b3049ffd
2024-06-20 13:53:20 -07:00
Bobby R. Bruce
888bf0d693 base: Update src/base/version.cc for v24.0
Change-Id: Iac980772a42853f9bfbdadb65d5efc3c5fdb6aed
2024-06-20 13:53:07 -07:00
Jason Lowe-Power
013f773d31 arch-riscv: Fix TLB lookup with vaddrs (#1264)
Previously, all of the TLB lookup/insert functions were using the full
virtual addresses even though the variables in the functions said "vpn."
This change explicitly converts the virtual address to the VPN without
any least significant zeros for the offset. I.e., vpn >> page_size.

The main bug solved in this changeset is the asid was |'d with the upper
bits of the virtual address, but sometimes there were all 1's.
Therefore, you could get a TLB hit even if the ASID was different.
Interestingly, the page that seemed to cause these issues was a 1 GiB
page.

This change also starts refactoring some of the page table details to
support sv46 and sv57 page table formats.

In my testing, the Linux kernel boot uses large pages (even OpenSBI uses
large pages), so it seems that large pages also work. However, this
seems like magic to me, so I'm not sure if it's correct.

This change also updates some asserts, and debug statements with more
useful debugging information.

Partially fixes #1235. More testing needs to be done to be confident.
2024-06-20 13:24:50 -07:00
Bobby R. Bruce
7137b73ca0 cpu: Fix std::min type mismatch in reg_class.hh (#1266)
Introduced in #1234, this caused compilation to faill in Apple Silicon
systems. This bug is the same as #582 where a more detailed explanation
is provided.
2024-06-20 13:02:08 -07:00
Mahyar Samani
7ff1e381c9 cpu,stdlib: Fix Access Trace for Accessing Indices in SpatterGen (#1258)
This change fixes the way indices are generated in a multi generator
setup.
It changes it from all cores generating the same trace of indices for
accessing the index array to each core generating an interleaved subset
of indices.
For an example look below for traces (indices to index array) in a 2
core setup.

Before:
core_0: 0, 1, 2, 3, 4, 5, 6, 7, ...
core_1: 0, 1, 2, 3, 4, 5, 6, 7, ...
After:
core_0: 0, 1, 2, 3, 8, 9, 10, 11, ...
core_1: 4, 5, 6, 7, 12, 13, 14, 15, ...

Additionally, this change fixes the SpatterKernel class in the standard
library to comply with the change in the SpatterGen source code.
2024-06-20 11:24:44 -07:00
TiredTumblrina
9fb0b18863 gpu-compute,mem,systemc: This commit corrects typos of 'cache' (#1263)
I noticed while using the stable branch that there were a few typos of
the word 'cache' and so I've corrected a few files where I found such
typos.

Change-Id: I7c7f64812039f34fe39d0c45c4f5ce921cba06d0
2024-06-20 09:45:13 -07:00
Jason Lowe-Power
943daeb603 stdlib: Add function to append kernel args (#1262)
Often, you want to add another argument to the default kernel arguments.
This function allows you to do that on the `kernel_disk_workload` board
mixin.
2024-06-20 09:14:55 -07:00
Bobby R. Bruce
1a00ecfaf9 stdlib,configs,tests: Add gem5 MultiSim (MultiProcessing for gem5) (#1167)
This allows for multiple gem5 simulations to be spawned from a single
parent gem5 process, as defined in a simgle gem5 configuration. In this
design _all_ the `Simulator`s are defined in the simulation script and
then added to the mutlisim module. For example:

```py
from gem5.simulate.Simulator import Simulator
import gem5.utils.multisim as multisim

# Construct the board[0] and board[1] as you wish here...

simulator1 = Simulator(board=board[0], id="board-1")
simulator2 = Simulator(board=board[1], id="board-2")

multisim.add_simulator(simulator1)
multisim.add_simulator(simulator2)
```

This specifies that two simulations are to be run in parallel in
seperate threads: one specified by `simulator1` and another by
`simulator2`. They are then added to MultiSim via the
`multisim.add_simulator` function. The user can specify an id via the
Simulator constructor. This is used to give each process a unique id and
output directory name. Given this, the id should be a helpful name
describing the simulation being specified. If not specified one is
automatically given.

To run these simulators we use `<gem5 binary> -m gem5.utils.multisim
<script> -p <num_processes>`. Note: multisim is an executable module in
gem5. This is the same module we input into our scripts to add the
simulators. This is an intentionally modular encapsulated design. When
the module processes a script it will schedule multiple gem5 jobs and,
dependent on the number of processes specified, will create child gem5
processes to processes tjese jobs (jobs are just gem5 simulations in
this case). The `--processes` (`-p`) argument is optional and if not
specified the max number of processes which can be run concurrently will
be the number of available threads on the host system.

The id for each process is used to create a subdirectory inside the
`outputdor` (`m5out`) of that id name. E.g, in the example above the
ID's are `board-1` and `board-2`. Therefore the m5 out directory will
look as follows:

```sh
- m5out
    - board-1
        - stats.txt
        - config.ini
        - config.json
        - terminal.out
    - board-2
        - stats.txt
        - config.ini
        - config.json
        - terminal.out
```

Each simulations output is encapsulated inside the subdirectory of the
id name.

If the multisim configuation script is passed directly to gem5 (like a
traditional gem5 configuraiton script, i.e.: `<gem5 binary> <script>`),
the user may run a single simulation specified in that script by passing
its id as an argument. E.g. `<gem5 binary> <script> board-1` will run
the `board-1` simulation specified in `script`. If no argument is passed
an Exception is raised asking the user to either specify or use the
MultiSim module if multiprocessing is needed.

If the user desires a list of ids of the simulations specified in a
given MultiSim script, they can do so by passing the `--list` (`-l`)
parameter to the config script. I.e., `<gem5 binary> <script> --list`
will list all the IDs for all the simulations specified in`script`.

This change comes with two new example scripts found in
'configs/example/gem5_library/multsim" to demonstrate multisim in both
an SE and FS mode simulation. Tests have been added which run these
scripts as part of gem5' Daily suite of tests.

Notes
=====

* **Bug fixed**: The `NoCache` classic cache hierarchy has been modified
so the Xbar is no longet set with a `__func__` call. This interfered
with MultiProcessing as this structure is not serializable via Pickle.
This was quite bad design anyway so should be changed

* **Change**: `readfile_contents` parameter previously wrote its value
to a file called "readfile" in the output dorectory. This has been
changed to write to a file called "readfile_{hash}" with "{hash}" being
a hash of the `readfile_contents`. This ensures that, during multisim
running, this file is not overwritten by other processes.

* **Removal note**: This implementation supercedes the functionality
outlined in 'src/python/gem5/utils/multiprocessing'. As such, this code
has been removed.

Limitations/Things to Fix/Improve
=================================

* Though each Simulator process has its own output directory (a
subdirectory within m5out, with an ID set by the user unique to that
Simulator), the stdout and stderr are still output to the terminal, not
the output directory. This results in: 1. stdout and stderr data lost
and not recorded for these runs. 2. An incredibly noisy terminal output.
* Each process uses the same cached resources. While there are locks on
resources when downloading, each processes will hash the resources they
require to ensure they are valid. This is very inefficient in cases
where resources are common between processes (e.g., you may have 10
processes each using the same disk image with each processes hashing the
disk images independently to give the same result to validate the
resources).

Change-Id: Ief5a3b765070c622d1f0de53ebd545c85a3f0eee

---------

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
2024-06-18 09:34:39 -07:00
Bobby R. Bruce
3138c8a8b1 gpu-compute,mem-ruby: Revert "Add RubyHitMiss flag for TCP and TCC cache" (#1254)
Reverts gem5/gem5#1226
2024-06-18 07:58:54 -07:00
Bobby R. Bruce
36f73f671d cpu,stdlib: Adding Spatter (#1136)
This PR adds source code for C++ implementation of SpatterGen as well as
SpatterKernel. SpatterGen uses a PyBindMethod to add kernels to the
backend code. This way the process of processing json files could be
offloaded to python. In addition it adds standard library components for
SpatterGenCore and SpatterGen. These two components follow the same
structure as AbstractCore and AbstractProcessor. In addition
spatter_kernel.py adds a definition for SpatterKernel in python to make
adding kernels to C++ easier. Also it adds utility functions for parsing
dictionaries read from json as well as partitioning traces for multicore
setups.
2024-06-17 15:28:45 -07:00
Hoa Nguyen
15e0236a8b arch,cpu,sim: Add mechanism to partially print vector regs (#1234)
Currently, gem5's inst tracer prints the whole vector register container
by default. The size of vector register containers in gem5 is the
maximum size allowed by the ISA. For vector-length agnostic (VLA) vector
registers, this means ARM SVE vector container is 2048 bits long, and
RISC-V vector container is 65535 bits long. Note that VLA implementation
in gem5 allows the vector length to be varied within the limit specified
by the ISAs.

However, in most use cases of gem5, the vector length is much less than
65535 bits. This causes two issues: (1) the vector container requires
allocating and moving around a large amount of unused data while only a
fraction of it is used, and (2) printing the execution trace of a vector
register results in a wall of text with a small amount of useful data.

This change addresses the problem (2) by providing a mechanism to limit
the amount data printed by the instruction tracer. This is done by
adding a function printing the first X bits of a vector register
container, where X is the vector length determined at runtime, as
opposed to the vector container size, which is determined at compilation
time.

Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7

---------

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 14:05:47 -07:00
hahaxxz
fef6a97f93 mem-ruby: This commit fixes MI_example protocol (#1236)
fix two bugs in MI_example-dir.sm:
1. Directory cannot handle DMA_READ & DMA_WRITE events in M_DRDI state.
2. Directory cannot handle PUTX_NotOwner events in {M_DWR, M_DRD,
M_DRDI, M_DWRI} state.

Github Issue: https://github.com/gem5/gem5/issues/1210

Change-Id: I52a9d674ce0688dcfbbcc2b583f17de95afdeb87
2024-06-17 12:45:11 -07:00
Hoa Nguyen
500da4306b arch: Mark FailUnimplemented instructions as Invalid instructions (#1247)
This is a follow-up on the discussion here [1].

The IsInvalid flag was previously defined as an instruction that does
not appear in the ISA. However, a micro-architecture can choose to not
recognize an instruction in and raise illegal instruction fault even if
the instruction is in the ISA.

This change modifies the definition of a Invalid instruction such that,
if a StaticInst instruction is marked as IsInvalid, it means the
instruction is not recognized by the decoder. This means that any
instruction recognized by the decoder are not invalid, even if the
instruction is not in the official ISA spec; e.g., m5
pseudo-instructions.

Note that instructions that are recognized by the decoder but are chosen
to act as a nop are not invalid. This applies to WarnUnimplemented
instructions, e.g. hint instructions.

[1] https://github.com/gem5/gem5/pull/1071

Change-Id: I1371b222d8b06793d47f434d0f148c5571672068

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 12:44:05 -07:00
Giacomo Travaglini
2804311f7b cpu-o3: Revert "Do not set Executed on load instruction to be replayed" (#1251)
Reverts gem5/gem5#1182

This is breaking O3 execution. Investigating the matter
2024-06-17 12:24:43 -07:00
Matt Sinclair
6776bebbf6 gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache (#1226)
Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag

Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1
2024-06-17 12:47:47 -05:00
Jarvis Jia
3a2bf47d57 Add default value and change Ruby address format specifier
Change-Id: I8fbaf34745e90589e610d3b9bd423937e7ebdc3d
2024-06-17 03:27:25 -05:00
Jarvis Jia
edb2e76077 Merge branch 'develop' into rubyhitmiss 2024-06-17 15:57:50 +08:00
Matthew Poremba
2b0ca93517 gpu-compute: Fix architected flat scratch
Currently writing to SRF which is incorrect, as the physical register
number can be clobbered by another wavefront if registers get renamed to
the physical register number.

Fix this by actually architecting the register, i.e., there is a
dedicated "hardware" register in the wavefront class.

Change-Id: I94e9e463eed348b2928cae884c1c20566c00984d
2024-06-15 15:46:33 -07:00
Matthew Poremba
2f5842d253 arch-vega: Add valid flag to ds_swizzle_b32
Currently the flag is just Load and there is a long comment explaining
why. This does not meet any of the scoreboard check requirements:

https://github.com/gem5/gem5/blob/develop/src/gpu-compute/scoreboard_check_stage.cc#L230-L241

Add a generic ALU flag as well so the instruction executes instead of
panicking.

Change-Id: I54b2d20d47fad5e8f05f927328433aab7db7d862
2024-06-15 14:28:59 -07:00
Matthew Poremba
42369eab2c arch-vega: Implement MI300 FLAT SVE bit
For scratch instructions only, this bit specifies if an offset in a VGPR
should be used for address calculation. This is new in MI300 and was
previously the LDS bit. The LDS bit is rarely used and in fact gem5 does
not even check this bit.

This fixes a bug when SADDR == 0x7f (i.e., no SGPR should be used) where
a VGPR was being added to the address when it should have been ignored.

Change-Id: I9864379692df6795b25b58b98825da05d18fc5db
2024-06-15 14:28:59 -07:00
Matthew Poremba
1dab4be002 arch-vega: Implement VOP3 V_FMAC_F32
A version of V_FMAC_F32 with extra modifiers from VOP3 format.

Change-Id: Ib6b41b0a3ceb91269b91a0287dfc94bc73e4d217
2024-06-15 14:28:58 -07:00
Matthew Poremba
f91d14fe46 gpu-compute: Add MFMA stats (#1248)
Add dynamic instruction counts for MFMAs.

Change-Id: I976b01344577cf011aeb3dd648a8c0017281c4e3
2024-06-15 13:04:00 -07:00
Mahyar Samani
d661023de4 stdlib: Adding SpatterGenCore and SpatterGen
This change adds code for SpatterGenCore and SpatterGen as well
as SpatterKernel to the standard library. SpatterGenCore and
SpatterGen follow the same structure as AbstractCore and
AbstractProcessor. spatter_kernel.py adds utility functions
to parse dictionaries as well as partition a list into
multiple lists through interleaving to be used when setting up
a multicore SpatterGen.

Change-Id: I003553e97f901c0724f5feac0bb6e21a020bd6ad
2024-06-14 13:44:34 -07:00
Mahyar Samani
6695e5ef70 cpu: Adding SpatterGen
This change adds source code for SpatterGen ClockedObject.
The set of source code pushed includes code for SpatterKernel
that tracks whether information is being gathered or scattered
as well as the list of indices to be accessed. This model
has PyBindMethod to add SpatterKernels from python.
This way all the preparations for kernels can be done in python.
SpatterGen has a few parameters that model limits on a few of
hardware resources in the backend of a processor, e.g. number
of functional units to calculate effective address, the latency
of calculating effective address, number of integer registers.

Change-Id: I451ffb385180a914e884cab220928c5f1944b2e3
2024-06-14 10:45:09 -07:00
Minje Jun
b8e21a2d32 cpu-o3: Do not set Executed on load instruction to be replayed (#1182)
A load instruction can be replayed when
1) it's strictly ordered or
2) it falls into load-store forwarding mismatch.

Case 1 was considered in executeLoad function but the case 2 wasn't. It
causes the case-2 replayed load instruction to violate the assertion
condition "assert(!load_inst->isExecuted())" in LSQUnit::read. This
commit fixes the problem by adding consideration of the case 2 in
LSQUnit::executeLoad.

Co-authored-by: Minje Jun <minje.jun@samsung.com>
2024-06-14 10:12:26 -07:00
Jason Lowe-Power
21ffd91529 cpu,arch: Add IsInvalid flag to Unknown insts (#1071)
The IsInvalid flag indicates that the static instruction is not part of
the executing ISA and not part of m5's pseudo-instructions. This flag
provides a way to recognize an illegal instruction at the decode stage.
2024-06-13 16:26:35 -07:00