Commit Graph

14952 Commits

Author SHA1 Message Date
Matthew Poremba
6bbde8fbb8 dev-amdgpu: Rework handling of unknown registers
The top level AMDGPUDevice currently reads/writes all unknown registers
to/from a map containing the previously written value. This is intended
as a way to handle registers that are not part of the model but the
driver requires for functionality. Since this is at the top level, it
can mask changes to register values which do not go through the same
interface. For example, reading an MMIO, changing via PM4 queue, and
reading again returns the stale cached value.

This commit removes the usage of the regs map in AMDGPUDevice,
implements some important MMIOs that were previously handled by it, and
moves the unknown register handling to the NBIO aperture only. To reduce
the number of additional MMIOs to implement, the display manager in
vega10 is now disabled.

Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2
2024-03-21 10:10:01 -05:00
Matthew Poremba
009cec56e0 dev-amdgpu: Check for SDMA copies to GART range
The SDMA engine can potentially be used to write to the GART address
range. Since gem5 has a shadow copy of the GART table to avoid sending
functional reads to device memory, the GART table must be updated when
copying to the GART range.

This changeset adds a check in the VM for GART range and implements the
SDMA copy packet writing to the GART range. A fatal is added to write
and ptePde, which are the only other two ways to write to memory, as
using these packets to update the GART table has not been observed.

Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd
2024-03-21 10:10:01 -05:00
Matthew Poremba
998709d4fc dev-amdgpu: Improve PM4 write data packet
The write data packet can write multiple dwords but currently always
assumes there is one dword, which can cause some write data to be
missed. This case is not common, but the number of dwords is implicitly
defined in the PM4 header.

This changeset passes the PM4 header to write data so that the correct
number of dwords can be determined. For now we assume no page crossing
when writing multiple dwords as the driver should be checking for that.

Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7
2024-03-21 10:10:01 -05:00
Matthew Poremba
c045c68540 dev-amdgpu: Add node_id to interrupt handler
The ROCm 6.0 driver adds a node_id field to interrupts which must match
before passing on the interrupt to be cleared by the cookie from gem5's
interrupt handler implementation. Add this field and enable for gfx942.

The usage of the field can be seen in event_interrupt_isr_v9_4_3 at
https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/
    gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449

Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922
2024-03-21 10:10:01 -05:00
Matthew Poremba
9ab004cccc arch-vega: Implement V_LSHL_ADD_U64
This is a new instruction in MI300 and operates similar to
V_LSHL_ADD_U32 but on 64-bit values.

Change-Id: Ia4ac65160bdad748fccdcb28286ba03157cc4046
2024-03-21 10:10:01 -05:00
Matthew Poremba
f36be791aa arch-vega: Expand FLAT subDecode range in main decoder
The main decoder for GPU instructions looks at the first 9 bits of a
dword to determine either the instruction or a subDecode table with more
information for specific instructions types. For flat instructions the
first 9 bits currently consist of 6 fixed encoding bits, a reserved bit,
and the first two bits of the opcode. Hence to support all opcodes there
are four indirections to the flat subDecode table. In MI300 the reserved
bit is part of a field to determine memory scope and therefore may be
non-zero.

This commit adds four addition calls to the subDecode table for the
cases where the scope bit is 1. See page 468 (PDF page 478) below:

https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/
    instruction-set-architectures/
    amd-instinct-mi300-cdna3-instruction-set-architecture.pdf

Change-Id: Ic3c786f0ca00a758cbe87f42c5e3470576f73a32
2024-03-21 10:10:01 -05:00
Michael Boyer
acd9d3ff94 gpu-compute: Add support for skipping GPU kernels (#940)
gpu-compute: Add support for skipping GPU kernels

This commit adds two new command-line options:

--skip-until-gpu-kernel N
Skips (non-blit) GPU kernels until the target kernel is reached.
Execution continues normally from there. Blit kernels are not skipped
because they are responsible for copying the kernel code and metadata
for the non-blit kernels. Note that skipping kernels can impact
correctness; this feature is only useful if the kernel of interest has
no data-dependent behavior, or its data-dependent behavior is not based
on data generated by the skipped kernels.

--exit-after-gpu-kernel N
Ends the simulation after completing (non-blit) GPU kernel N.

This commit also renames two existing command-line options:
--debug-at-gpu-kernel -> --debug-at-gpu-task
--exit-at-gpu-kernel  -> --exit-at-gpu-task

These were renamed because they count GPU tasks, which include both
kernels launched by the application as well as blit kernels.

Change-Id: If250b3fd2db05c1222e369e9e3f779c4422074bc
2024-03-21 07:46:27 -07:00
Michael Boyer
ba2f5615ba gpu-compute: Support cache line sizes >64B in GPUFS (#939)
This change fixes two issues:

1) The --cacheline_size option was setting the system cache line size
but not the Ruby cache line size, and the mismatch was causing assertion
failures.

2) The submitDispatchPkt() function accesses the kernel object in
chunks, with the chunk size equal to the cache line size. For cache line
sizes >64B (e.g. 128B), the kernel object is not guaranteed to be
aligned to a cache line and it was possible for a chunk to be partially
contained in two separate device memories, causing the memory access to
fail.

Change-Id: I8e45146901943e9c2750d32162c0f35c851e09e1

Co-authored-by: Michael Boyer <Michael.Boyer@amd.com>
2024-03-20 11:09:25 -07:00
Giacomo Travaglini
2b67d0eba6 stdlib, tests, configs: Add a new PrivateL1PrivateL2WalkCache hierarchy (#935)
From [1] The PrivateL1PrivateL2Cache hierarchy has been amended with an
MMUCache, which is basically a small cache in front of the page table
walker.

Not every ISA makes use of it.
Arm for example already implements caching of page table walks, via the
partial_levels parameter in the ArmTLB.
With this patch we define a new module which explicitly makes use of the
WalkCache. Configurations that do not require
another cache in the first level of the memsys (for the ptw) can use the
PrivateL1PrivateL2CacheHierarchy
    
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/49364
2024-03-19 09:04:32 +00:00
Yu-Cheng Chang
dbae09e4d9 arch-riscv: Move alignment check to Physical Memory Attribute(PMA) (#914)
In the RISC-V unprivileged spec[1], the misaligned load/store support is
depend on the EEI.
    
In the RISC-V privileged spec ver1.12[2], the PMA specify wether the
misaligned access is support for each data width and the memory region.
    
In the [3] of `mcause` spec, we cloud directly raise misalign exception
if there is no memory region misalignment support. If the part of memory
region support misaligned-access, we need to translate the `vaddr` to
`paddr` first then check the `paddr` later. The page-fault or
access-fault is rose before misalign-fault.
    
The benefit of moving check_alignment option from ISA option to PMA
option is we can specify the part region of memory support misalign
load/store.

MMU will check alignment with virtual addresss if there is no misaligned
memory region specified. If there are some misaligned memory region
supported, translate address first and check alignment at final.
    
[1]
https://github.com/riscv/riscv-isa-manual/blob/main/src/rv32.adoc#base-instruction-formats
[2]
https://github.com/riscv/riscv-isa-manual/blob/main/src/machine.adoc#physical-memory-attributes
[3]
https://github.com/riscv/riscv-isa-manual/blob/main/src/machine.adoc#machine-cause-register-mcause
2024-03-18 12:59:13 -07:00
Yan Lee
84da503d37 mem: Fix callback of functional access in port wrapper (#938)
In previous implementation of port_wrapper, recvFunctional() will call
timing request callback. This should be a typo and this change fix the
typo.
2024-03-18 08:21:43 -07:00
Giacomo Travaglini
d32a438913 stdlib: Add a new private_l1_private_l2_walk_cache_hierarchy.py module
From [1] The PrivateL1PrivateL2Cache hierarchy has been amended
with an MMUCache, which is basically a small cache in front
of the page table walker. Not every ISA makes use of it.

Arm for example already implements caching of page table
walks, via the partial_levels parameter in the ArmTLB.

With this patch we define a new module which explicitly makes
use of the WalkCache. Configurations that do not require
another cache in the first level of the memsys (for the ptw)
can use the PrivateL1PrivateL2CacheHierarchy

[1]: https://gem5-review.googlesource.com/c/public/gem5/+/49364

Change-Id: I17f7e68940ee947ca5b30e6ab3a01dafeed0f338
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-18 09:42:05 +00:00
Giacomo Travaglini
0ec8cf8d05 dev-arm: Fix SMMUv3 DTB autogen (#934)
Replacing FdtProperyWords (expecting an integer) with FdtPropertyStrings

Change-Id: Icd1cf00704e253c88ac9b1d69c3cf946d2a8ca70

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-14 15:42:57 +00:00
Tiago Mück
942979162a READ_MODIFY_WRITE flag fix (#922)
Change bit for Request::READ_MODIFY_WRITE, which was the same as
Request::ACQUIRE.

Signed-off-by: Tiago Mück <tiago.muck@arm.com>
2024-03-11 08:32:11 -07:00
Giacomo Travaglini
5161195db5 dev-arm: Remove the SMMUv3 irq_interface_enable parameter
The SMMU_IRQ_CTRL had been made optionally writeable by a
prior patch [1] even if interrupts were not supported in
the SMMUv3 model.
As we are partially enabling IRQ support, we remove this option
and we make the SMMU_IRQ_CTRL always writeable

[1]: https://gem5-review.googlesource.com/c/public/gem5/+/38555

Change-Id: Ie1f9458d583a5d8bcbe450c3e88bda6b3c53cf10
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-08 13:53:44 +00:00
Giacomo Travaglini
d63282a9da dev-arm: Implement wired interrupt for SMMU event queue
See https://github.com/orgs/gem5/discussions/898

The SMMUv3 Event Queue is basically unused at the moment.  Whenever a
transaction fails we actually abort simulation.  The sendEvent method
could be used to actually report the failure to the driver but it is
lacking interrupt support to notify the PE there is an event to handle.
The SMMUv3 spec allows both wired and MSI interrupts to be used.

We add the eventq_irq SPI param to the SMMU object and we draft an
initial sendInterrupt utility that makes use of it whenever it is
needed.

Change-Id: I6d103919ca8bf53794ae4bc922cbdc7156adf37a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-08 13:53:21 +00:00
Giacomo Travaglini
63c815b5fc dev-arm: Do not panic in the SMMUv3 for fauting transactions
Rely on the architected solution instead of aborting simulation.
This means handling writes to the Event queue to signal managing
software there was a fault in the SMMU

Change-Id: I7b69ca77021732c6059bd6b837ae722da71350ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-08 11:29:22 +00:00
Giacomo Travaglini
7d5d1cd9c8 dev-arm: Rewrite SMMUEvent
The struct fields of the SMMUEvent were not matching the SMMUv3 specs.
This was "not an issue" as events have been implicitly disabled until
now (every translation error was aborting simulation)

With generateEvent we automatically construct a SMMU event from
a translation result.

Change-Id: Iba6a08d551c0a99bb58c4118992f1d2b683f62cf
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-08 11:29:22 +00:00
Giacomo Travaglini
ef10db5a3e dev-arm: Record additional information in the TranslResult
A faulting translation should return additional information
(other than the fault type). This will be used by future
patches to properly populate the SMMU event record of the
event queue

As we currenlty support two faults only:

1) F_TRANSLATION
2) F_PERMISSION

We add to TranslResult the relevant fault information only:
type, class, stage and ipa

Change-Id: I0a81d5fc202e1b6135cecdcd6dfd2239c2f1ba7e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-08 11:29:22 +00:00
Giacomo Travaglini
3d1f68f205 dev-arm: Return translation fault in doReadCD
Reading the Context Descriptor (CD) might require a stage2
translation. At the moment doReadCD does not check for the
return value of the translateStage2.
This means that any stage2 fault will be silently discarded
and an invalid address will be used/returned.

By returning a translation result we make sure any error
happening in the second stage of translation will be properly
flagged

Change-Id: I2ecd43f7e23080bf8222bc3addfabbd027ee8feb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-08 11:29:22 +00:00
Giacomo Travaglini
4a4b775985 dev-arm: Provide encapsulation by adding TranslResult::isFaulting
We don't check the fault type directly. This will improve
readability once the TranslResult class will be augmented
with extra fields

Change-Id: I5acafaabf098d6ee79e1f0c384499cc043a75a9d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-08 11:29:22 +00:00
Ivan Fernandez
f6c61836b3 arch-riscv: adding vector unit-stride segment loads to RISC-V (#851)
This commit adds support for vector unit-stride segment load operations
for RISC-V (vlseg<NF>e<X>). This implementation is based in two types of
microops:
- VlSeg microops that load data as it is organized in memory in structs
of several fields.
- VectorDeIntrlv microops that properly deinterleave structs into
destination registers.

Gem5 issue: https://github.com/gem5/gem5/issues/382
2024-03-06 11:27:06 -08:00
Giacomo Travaglini
3d2052bc03 misc: Serialize the ISA as a string in the checkpoint
With the introduction of multi-ISA gem5, we don't store the TARGET_ISA
anymore as a string in the root section of the checkpoint [1].  There is
therefore no way at the moment to asses the ISA of a CPU/ThreadContext.
This is a problem when it comes to checkpoint updates which are ISA
specific.

By explicitly serializing the ISA as a string under the cpu.isa section
we avoid this problem and we let cpt_upgraders be aware of the ISA in
use.

[1]: https://gem5-review.googlesource.com/c/public/gem5/+/48884

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1e75230cbc370cab84f4a54141b1e425af2dbfac
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2024-03-04 17:51:40 +00:00
Nitish Arya
676d571009 arch-riscv: adding stats to show completed page walks (#869)
This commit adds statistics showing completed page walks for 4KB and 2MB
pages. This will add to stats.txt the variables num_4kb_walks,
num_2mb_walks and the corresponding values. This is done based on the
level of page table walk traversed specific to Sv39 Virtual Memory
System.
2024-03-04 08:38:28 -08:00
Giacomo Travaglini
c57a6b0d59 mem-cache: Add support for partitioning caches (#765)
* Add Cache partitioning policies to manage and enforce cache
partitioning:
    * Add Way partition policy 
    * Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
Partition IDs for cache partitioning and monitoring
* Modify Cache SimObjects to store partition policies
* Modify Cache block eviction logic to use new partitioning policies

Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>

Change-Id: Ib35153a8b46803c22a433926270d82e5e19ce544
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-04 09:44:01 +00:00
Giacomo Travaglini
c1d5ffe7c7 mem-cache: Prefetchers Improvements (#872)
This pull request contains a set of small patches which fix some bugs in
the gem5 prefetchers, and aligns out-of-the box prefetcher performance
more closely with that which a typical user would expect.

The performance patches have been tested with an out-of-the-box
(untuned) Stride prefetcher configuration against a set of SPEC 2017
SimPoints, and show a small-to-modest IPC uplift across the about half
the benchmarks, with no significant IPC degradation.

The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.

This PR is an updated version of PR #564, which was reverted due to Bug
#580. Bug #580 was fixed in PR #871. This PR updates #564 to the latest
state of the develop branch, and should be applied after PR #871.
2024-03-04 09:09:47 +00:00
Ivana Mitrovic
fae5f5e00b sim-se: Catch None value if binary is not compatible with gem5 (#903)
Adding an error message in case the binary is not compatible with gem5.

This PR is addressing the comments in issue #807.

Change-Id: I66466ed6f657276c13d237fde3b1ec12c20cfe91
2024-03-01 16:41:18 -08:00
Ivana Mitrovic
61adfa38b2 stdlib: Fix initialization for self.pic.hart_config in lupv_board (#904)
Previously merged PR #886 created pic.hart_config, but it was not
initialized properly in lupv_board.py. This issue is causing daily tests
to fail.

Change-Id: I193ff4a3e5ef787eefcf066404e762f024fa6603

---------

Co-authored-by: Yu-Cheng Chang <aucixw45876@gmail.com>
2024-03-01 11:25:00 -08:00
Giacomo Travaglini
c0e5d58a96 dev: RegisterBank addRegistersAt for fragmented reg banks (#902)
One of the limitations of the RegBank class is that it does not allow
you to pass a non-contiguous set of registers. Its simplest form will
just accept an initializer list of registers and it will store them in
sequence.

A more refined version [1] will optionally accept an offset value to be
passed alongside the register reference. This is not meant to be used by
the register bank to store the register at the provided offset.

It is rather used by the bank to sanity check the register sits exactly
at the provided range.

The way to work around this for a fragemented register space is to
explicitly allocate RAZ/RAO blocks as registers and to pass them to
addRegisters together with the others. (See the SysSecCtrl [2] as an
example)

This makes it a bit tedious to model a register bank with gaps between
its registers. First, the exact number and position of the gaps needs to
be extraced from a spec. These sometimes report only implemented
registers and their offset, and omit to document gaps/reserved space. So
a developer needs to manually add register offset and size to check if
all registers are contiguous. Second, these reserved register blocks
need to be instantiated in the bank adding boilerplate code and
affecting readibility.

For these reasons we add a new registration method, called
addRegisters*At*. It reuses the RegisterAdder class but this time the
offset field is really used to instruct the bank where the register
should be mapped. The method is templated and the template parameter
tells the bank which register type should be used to fill the remaining
space. We make the RegBank the owner of this filler space (registers are
generated internally within addRegistersAt).

[1]: https://github.com/gem5/gem5/blob/stable/src/dev/reg_bank.hh#L106
[2]: https://github.com/gem5/gem5/blob/stable/src/dev/arm/ssc.cc#L48

Change-Id: I614ae6e9eeb40b365ac9b6dd8b75abbfdb9cb687

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-01 15:32:40 +00:00
Hristo Belchev
27c8355565 mem-cache: Add support for partitioning caches
* Add Cache partitioning policies to manage and enforce cache partitioning:
    * Add Way partition policy
    * Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
  Partition IDs for cache partitioning and monitoring
* Modify Cache Tags SimObjects to store partition policies
* Modify Cache Tags block eviction logic to use new partitioning policies
* Add example system and TrafficGen configurations for testing Cache
  Partitioning Policies

Change-Id: Ic3fb0f35cf060783fbb9289380721a07e18fad49
Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-03-01 15:26:38 +00:00
Mahyar Samani
9bd71bff0c python: Adding fatal statement to notify user mistakes. (#826)
This change adds a fatal statement to check all params for all
SimObjects have been unproxied before C++ object are created.
The fatal statement notifies the user of a mistake that could
possibly lead to a SimObject to not have its params unproxied.
This mistake could be made by adding a child SimObject with a
name that starts with an underscore.
2024-02-29 10:47:26 -08:00
Matthew Poremba
db42aeb630 arch-vega: Implement accumulation offset (#895)
This PR implements a few changes related to the accumulation offset
which is new in MI200. Previously MI100 contained two vector register
files: the architectural and accumulation register files. These have now
been unified and the architectural register file is twice the size. As a
result of this the dispatch packet set an offset into the unified vector
register file for where the former accumulation registers would go. The
changes are:

- Calculate the accumulation offset from dispatch packet and store in
HSA task.
- Update the accumulation move instructions (v_accvgpr_read/write) to
use it.
- Update the current MFMA instructions to use it.
- Make the MFMA examples more clean.
2024-02-29 09:05:39 -08:00
Nicholas Mosier
69762e272e sim-se, arch-x86: initialize max stack size from parameter (#892)
Initialize x86 process' max stack size to the value given in the process
params, rather than hard-coding it to 8 MB, which made it impossible to
run x86 programs requiring more than 8 MB of stack.

Change-Id: I0b17fe60b016b1e4a82d704ef7ad367974ea6a08
2024-02-29 08:15:43 -08:00
amatabsc
0d79b5098b Increased packets sanity check limit to 1024 (#797)
For some simulations with big values for VLEN (e.g. 8k and 16k) there
were more packets created on the fly and, as a consequence, failing the
simulations. The sanity check has been increased in order to solve this
high VLEN cases.

Supervised by [@aarmejach](https://github.com/aarmejach)
Change-Id: I137b0f3113687b3fc9c4154d19ca5e8017e6e992

Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
2024-02-29 08:12:59 -08:00
Matt Sinclair
777ac91bb0 mem-ruby: Add categorization of bypassed atomics in TCC (#899)
Adds categorization of bypassed atomics in TCC to the TBE as either
return or no-return, which gets consumed in pa_performAtomic to
determine if atomic logs should be stored.

Reestablishes TCC bypassed atomics after #546.

Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
2024-02-28 14:26:09 -06:00
Daniel Kouchekinia
de615836f0 mem-ruby: Add categorization of bypassed atomics in TCC
Adds categorization of bypassed atomics in TCC to the TBE as either return
or no-return, which gets consumed in pa_performAtomic to determine if
atomic logs should be stored.

Reestablishes TCC bypassed atomics after #546.

Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
2024-02-27 23:12:45 -06:00
Daniel Kouchekinia
0fd73f4e05 Merge branch 'develop' into missing-tcc-transition 2024-02-27 16:46:30 -06:00
Hristo Belchev
e78a6b71fe Merge branch 'develop' into qos-qpolicy-assertions-fix 2024-02-27 09:38:34 +00:00
Matthew Poremba
2ca7f48828 arch-vega: Accumulation offset for existing MFMA insts
This commit update the two exiting MFMA instructions to support the
accumulation offset for A, B, and C/D matrix. Additionally uses array
indexed C/D matrix registers to reduce duplicate code. Future MFMA
instructions have up to 16 registers for C/D and this reduces the amount
of code being written.

Change-Id: Ibdc3b6255234a3bab99f115c79e8a0248c800400
2024-02-26 14:30:50 -06:00
Daniel Kouchekinia
6374697a20 mem-ruby: Add missing transition for SLC writes to VIPER TCC
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously
missing.

Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
2024-02-26 13:13:06 -06:00
Matthew Poremba
e0e65221b4 arch-vega: Use accum offset for v_accvgpr_read/write
The accum offset is used as an index into the unified VGPR register file
in MI200 and is not the same as a move if accum_offset in the dispatch
packet is non-zero.

Change these instructions to use the stored accum_offset value.

Change-Id: Ib661804f8f5b8392e4c586082c423645f539e641
2024-02-26 12:57:09 -06:00
Matthew Poremba
8722aef2e2 gpu-compute: Store accum_offset from code object in WF
The accumulation offset is needed for some instructions. In order to
access this value we need to place it somewhere instruction definitions
can access. The most logical place is in the wavefront.

This commit simply copies the value from the HSA task to the wavefront
object.

Change-Id: I44ef62ef32d2421953f096c431dd758e882245b4
2024-02-26 12:54:37 -06:00
Yu-Cheng Chang
bcf455755e arch-riscv,dev: Update the PLIC implementation (#886)
Update the PLIC based on the
[riscv-plic-spec](https://github.com/riscv/riscv-plic-spec) in the PR:
- Support customized PLIC hardID and privilege mode configuration
- Backward compatable with the n_contexts parameter, will generate the
config like {0,M}, {0,S}, {1,M} ...

Change-Id: Ibff736827edb7c97921e01fa27f503574a27a562
2024-02-26 10:32:53 -08:00
Ivana Mitrovic
61ee36eee6 mem-ruby: Fix possible dirty line loss in CHI when ReadShared hit on UD line (#791)
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream passes Dirty to the requestor whenever possible
even though it doesn't deallocate the line. This will make the requestor
to SD and the downstream to UD_RSD.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example error condition is as below.
   
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
    
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3
doesn't back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock. L3 becomes RUSC and LLC becomes
UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data
2024-02-26 10:06:17 -08:00
Giacomo Travaglini
1d5be8d9e5 mem-cache: Optimize strided prefetcher address generation
This commit optimizes the address generation logic in the strided
prefetcher by introducing the following changes

(d is the degree of the prefetcher)

* Evaluate the fixed prefetch_stride only once (and not d-times)
* Replace 2d multiplications (d * prefetch_stride and distance *
prefetch_stride) with additions by updating the new base prefetch
address while looping

Change-Id: I3ec0c642bc9ec7635b0d38308797e99b645304bb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-02-26 10:40:45 +00:00
Nikolaos Kyparissas
a5fece3b91 mem: added distance parameter to stride prefetcher
The Stride Prefetcher will skip this number of strides ahead of the
first identified prefetch, then generate `degree` prefetches at
`stride` intervals. A value of zero indicates no skip (i.e. start
prefetching from the next identified prefetch address).

This parameter can be used to increase the timeliness of prefetches by
starting to prefetch far enough ahead of the demand stream to cover
the memory system latency.

[Richard Cooper <richard.cooper@arm.com>:
- Added detail to commit comment and `distance` Param documentation.
- Changed `distance` Param from `Param.Int` to `Param.Unsigned`.
]

Change-Id: I4ce79c72d74445b12acf68e0a54e13966e30041c
Co-authored-by: Richard Cooper <richard.cooper@arm.com>
Signed-off-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2024-02-26 10:40:45 +00:00
Nikolaos Kyparissas
1ccdf407cb mem-cache: Added clean eviction check for prefetchers.
pkt->req->isCacheMaintenance() would not include a check
for clean eviction before notifying the prefetcher,
causing gem5 to crash.

Change-Id: I4a56c7384818c63d6e2263f26645e87cef1243cb
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2024-02-26 10:40:45 +00:00
Richard Cooper
9fe998a8c0 mem-cache: Update default prefetch options.
Update the default prefetch options to achieve out-of-the box
prefetcher performance closer to that which a typical user would
expect. Configurations that set these parameters explicitly will be
unaffected.

The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.

Change-Id: Ia6c1803c86e42feef01de40c34d928de50fe0bed
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2024-02-26 10:40:45 +00:00
Richard Cooper
05f33fbef5 mem-cache: Squash prefetch queue entries by block address.
Prefetch queue entries were being squashed by comparing the address
of each queued prefetch against the block address of the demand
access. Only prefetches that happen to fall on a cache-line block
boundary would be squashed.

This patch converts the prefetch addresses to block addresses before
comparison.

Change-Id: I3a80a1e3d752f925595e33edebf5359d2cc67182
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2024-02-26 10:40:45 +00:00
Yu-Cheng Chang
47f3ad45d3 stdlib: Add get_last_exit_event_code to get m5 exit status code (#890)
Change-Id: I7319437dff24e31f343e71b6b8993f833b62147c
2024-02-23 09:09:28 -08:00