Commit Graph

21547 Commits

Author SHA1 Message Date
Alexander Richardson
1bb5d3b99e arch-riscv: Add support for RISC-V semihosting (#681)
See https://github.com/riscv-software-src/riscv-semihosting for the
current specification. Almost all code is shared with the Arm
implementation.

Tested by running some binaries built with
[picolibc](https://github.com/picolibc/picolibc).
2024-04-27 05:12:32 -07:00
Ivana Mitrovic
939d8e28df mem-cache: Fix TreePLRU num leaves error (#1075)
This PR fixes the error noted here #1073. 

Change-Id: I5d31c259ac5ee93f46f28b20eda4f58460ba8523
2024-04-26 20:22:20 -07:00
Robert Hauser
1b323a9571 systemc: remove if clause in Gem5ToTlmBridgeBase (#1059)
In the payload event queue in Gem5ToTlmBridgeBase, the phase is checked
twice for BEGIN_RESP. This commit removes the second if clause since it
is unnecessary.

Duplicate if clause in line 234 & line 256


dd2689905f/src/systemc/tlm_bridge/gem5_to_tlm.cc (L234-L267)

please correct me if I am missing something important
2024-04-25 11:15:30 -07:00
Nicholas Mosier
c679c9c127 cpu-o3: prioritize exiting threads when committing (#1056)
Fix #1055. Prioritize committing from exiting threads before we consider
other threads using the specified SMT commit policy. All instructions in
the ROB for exiting threads should already have been squashed. Thus,
this ensures that the ROB instruction queues for all exiting threads
will be empty at the end of the current cycle, avoiding the assertion
failure encountered in #1055.

Change-Id: Ib0178a1aa6e94bce2b6c49dd87750e82776639dc
2024-04-25 11:15:14 -07:00
Nicholas Mosier
51d546cb06 cpu-o3: Clear current macro-op in fetch if squashing after last micro-op (#1047)
Fix #1042. Clear the current fetch macro-op if the instruction
initiating the squash is the last micro-op in its macro-op.

Change-Id: I77f60334771277e47f19573d4067b3a7bc5488b2
2024-04-25 11:14:58 -07:00
Nicholas Mosier
66decb2e93 mem-ruby: Fix functional reads for MESI Three-Level messages (#1045)
Fix #1044. This patch adds checks for message types (PUTX_COPY, DATA,
DATA_EXCLUSIVE) that contain data blocks but were missing from the
original `functionalRead` method in MESI Three-Level messages.

Change-Id: I0cedc314166c9cc037bf20f5b7fef5552dd1253c
2024-04-25 11:14:37 -07:00
Harshil Patel
d75afeabb1 tests: fix persistence issue in pyunit tests (#1070)
- Fixed patching/ mocking of functions and global variables to reset for
each test.
- Uncommented tests as they should pass now.
2024-04-25 10:03:10 -07:00
Giacomo Travaglini
83e55743e1 arch-arm: Add misc_accessor templated functions to read/write regs at different ELs (#1072)
A usual system register read/write pattern is something like the
following

```
switch(el) {
    case EL1:
        tc->readMiscReg(REG_EL1);
    case EL2:
        tc->readMiscReg(REG_EL2);
    case EL3:
        tc->readMiscReg(REG_EL3);
}
```

To avoid repeating these switch statements all over gem5, we define
templated functions which have
an accessor struct as a template parameter. These accessor will help
populating the templated switch
construct. We provide the FAR register accessor as an example. The
accessor should define the following
fields: (type, el0, el1, el2, el3)

Example:

```
struct FarAccessor
{
    using type = RegVal;
    static const MiscRegIndex el0 = NUM_MISCREGS;
    static const MiscRegIndex el1 = MISCREG_FAR_EL1;
    static const MiscRegIndex el2 = MISCREG_FAR_EL2;
    static const MiscRegIndex el3 = MISCREG_FAR_EL3;
};
```
2024-04-25 14:57:10 +01:00
Andreas Sandberg
85d21b5718 cpu-kvm: Support perf counters on hybrid host architectures (#1065)
Fix #1064 by adding support for hardware performance counters on hybrid
architectures like Intel Alder Lake.

Hybrid architectures have multiple types of cores, each of which require
the instantiation of a separate performance counter. The KVM CPU's
PerfKvmCounter class was not aware of this, any only instantiated a
single performance counter, implicitly bound to the P-core only. This
meant that if gem5 ever ran on an E-core, the various hardware
performance counters would not get updated properly, in some cases
always zero (e.g., for the number of instructions executed).

This patch adds support for hybrid host architectures as follows. First,
we convert PerfKvmCounter into an abstract class, which has two concrete
implementations: SimplePerfKvmCounter and HybridPerfKvmCounter. The
former is used for non-hybrid architectures or for non-hardware
performance counters and is functionally equivalent to the prior
implementation of PerfKvmCounter. The latter is used for instantiating
hardware performance counters (i.e., of type PERF_TYPE_HARDWARE) on
hybrid host architectures. It does so by internally instantiating two
SimplePerfKvmCounters, one for a P-core and one for an E-core. Upon
read, it sums the results of reading the two internal counters.

Change-Id: If64fcb0e2fcc1b3a6a37d77455c2b21e1fc81150
2024-04-25 10:45:47 +01:00
Giacomo Travaglini
a3d030d161 arch-arm: Add the FAR_EL* register accessor
Use it accordingly in the faulting/exception logic

Change-Id: I2f6360d04698b6fb7188e776f1d6966e99ce19b1
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-25 09:45:54 +01:00
Giacomo Travaglini
19628e746d arch-arm: Add readRegister/writeRegister templates
This is adding two templated functions for reading/writing
system registers (MiscRegs). It is introducing them inside
a new misc_regs namespace.

Change-Id: I21233337c057673d46d1147971ebabbfc2c2bb6a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-25 09:45:00 +01:00
Giacomo Travaglini
01602cdf13 tests: Revert "tests: Move the arm+ruby tests to not use ALL" (#1069)
This reverts commit c1de2b8762. We revert
the commit as Ruby does not use get_runtime_isa anymore after [1]

[1]: https://github.com/gem5/gem5/pull/241

Change-Id: Iaac8d64194bbd53a9b1a57a796ff92f763c75a87
2024-04-24 21:01:53 -07:00
Bobby R. Bruce
b83a53e521 tests: Fix gem5 testlib compilation (#1063)
Prior to this patch the usage of KConfig was creating an empty config in
the case where a protocol was not specified.
2024-04-24 21:01:30 -07:00
Ivana Mitrovic
cc3655cdad arch-arm: Refactor PTW (#1060)
This PR is refactoring the Arm PageTableWalker in the following way:

1) Simplifying the currState handling logic (mainly the tear down)
2) Amending the TlbTestInterface APIs to use a RequestPtr reference
3) Use finalizePhysical even when MMU is off, which means allowing
memory mapped m5ops to work also in that circumstance
2024-04-24 21:00:42 -07:00
Nicholas Mosier
ed8a09303a mem-cache: Remove power-of-2 requirement for TreePLRU num leaves (#1061)
Remove the requirement in TreePLRU's implementation that the number of
leaves (i.e., the number of cache ways) be a power of two. Firstly, on
some recent processors, this is not the case---for example, Intel Golden
Cove's L1D has 12 ways. Secondly, The implementation of TreePLRU appears
to work just fine as-is with a way count that's not a power of two.

Change-Id: If2a27dc5bbe7a8e96684f79ce791df5c0b582230
2024-04-24 20:59:06 -07:00
Giacomo Travaglini
bf78579fa5 arch-arm: Change the TlbTestInterface to accept a RequestPtr
Now that the Request has been made an Extensible object, it
can carry within itself much more data. It makes sense
to pass it to the TlbTestInterface as more information about
the table walk can be extracted from it.

This is also aligning with the testTranslation utility which
is expecting a request reference as first argument.

Change-Id: I3dbc9a81d6b4bcc1801246ba7eb4136774d8f3c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-24 18:12:36 +01:00
Giacomo Travaglini
89323c5112 arch-arm: Group testTranslation and finalizeTranslation together
They both make final checks to the VA->PA translation before
relinquishing control back to the translate client (usually
CPU code)

Change-Id: Ib0a9da25404248c22c6a240817d2f50f0913fdf7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-24 18:12:36 +01:00
Giacomo Travaglini
0c20eb3ec7 arch-arm: Call finalizePhysical even when MMU is off
The finalizePhysical is just checking if the physical
address falls within the m5op region (if using mmapped
m5ops). There's not reason why we shouldn't enable it
with virtual memory off

Change-Id: I5ab80fd4e7886743abd4b7d85937b72253b578d3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-24 18:12:36 +01:00
Giacomo Travaglini
a299d2db0c arch-arm: Move testWalk check within the fetchDescriptor
We also unify the fault handling logic; rather than cleaning
up the WalkerState in several places scattered throughout the
walking code, we handle faults in the top level method

Change-Id: Ia22fb6f27044ff445fffbab228777a48efa473cb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-24 18:12:36 +01:00
Giacomo Travaglini
6d0cb6eaa3 arch-arm: Pull out Request generation from the TableWalker::Port
Change-Id: Ie8c309bb79b4ce7c656428660c9e2effd58a89f0
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-24 18:12:36 +01:00
Giacomo Travaglini
e450cfef16 arch-arm: Move testWalk functionality to the TableWalker class
It's more efficient to pass a reference of the tester to the
TableWalkers. In this way a table walk check is tested directly
from the walkers instead of going through the MMU every time.

Change-Id: I9820dbabb8b551981005a65efa54a76b1a027541
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-24 18:12:36 +01:00
Giacomo Travaglini
bbe5bf2644 arch-arm: Simplify TableWalker::walk method
Change-Id: Ib823b3b577a70f6ec14de854cb9c250faa04e932
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
2024-04-24 18:12:36 +01:00
Giacomo Travaglini
9d9b7848bb arch-arm: Properly compute EL even in stage2 walks
This is done in order to differentiate between EL0 (unprivileged) and
EL1. Effectively it won't change much as most of the decisions are
now taken according to the translation regime which will be the
same regardless (EL10)

Change-Id: I218037e9c19cf638aff05c51869e439204d9af69
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2024-04-24 18:12:36 +01:00
Nicholas Mosier
cf5ec880c9 cpu-kvm: Support overflows when migrating across hybrid cores
Add support for event overflows when the host thread migrates across
differnt types of cores on a hybrid host architecture. This patch
achieves this by simply halving the sample period for each performance
counter. Since there are two types of cores, this guarantees that an
overflow event will trigger before N events occur, where N is the
requested period (e.g., number of instructions to simulate). This
may result in many early triggers (up to log2(N)) before the requested
period is reached. However, gem5's existing bookkeeping logic already
handles this case properly: if fewer events than requested occurred,
it will set a new period (N - observed) and resume execution. This loop
will exit once N events have actually occurred.

Change-Id: Iff85237da1ae1aa25bc2045fbf9091726291fe36
2024-04-24 09:47:46 -07:00
Nicholas Mosier
30ea15009f cpu-kvm: Support perf counters on hybrid host architectures
Fix #1064 by adding support for hardware performance counters on hybrid
architectures like Intel Alder Lake.

Hybrid architectures have multiple types of cores, each of which require
the instantiation of a separate performance counter. The KVM CPU's
PerfKvmCounter class was not aware of this, any only instantiated a
single performance counter, implicitly bound to the P-core only. This
meant that if gem5 ever ran on an E-core, the various hardware
performance counters would not get updated properly, in some cases
always zero (e.g., for the number of instructions executed).

This patch adds support for hybrid host architectures as follows. First,
we convert PerfKvmCounter into an abstract class, which has two
concrete implementations: SimplePerfKvmCounter and HybridPerfKvmCounter.
The former is used for non-hybrid architectures or for non-hardware
performance counters and is functionally equivalent to the prior
implementation of PerfKvmCounter. The latter is used for instantiating
hardware performance counters (i.e., of type PERF_TYPE_HARDWARE) on
hybrid host architectures. It does so by internally instantiating two
SimplePerfKvmCounters, one for a P-core and one for an E-core. Upon
read, it sums the results of reading the two internal counters.

Change-Id: If64fcb0e2fcc1b3a6a37d77455c2b21e1fc81150
2024-04-24 09:47:46 -07:00
Bobby R. Bruce
9f5c97c7fd stdlib: Tests Fix/Disable pyunit tests (#1067) 2024-04-23 22:05:19 -07:00
Harshil Patel
5658eec958 tests: update mocking for tests
- After removal of the ClientWrapper class, the mocking of clients needs
  to be changed to _create_client function.
- Commented failing tests due to persistence issues.
  The persistence is being caused as the new mocked clients
  not being used as the older clients are persisting over
  the tests.

Change-Id: Ie342c9fc8103504dd12f49ae30b3bf62d189ce1d
2024-04-23 16:26:44 -07:00
Harshil Patel
d548f2c5c4 tests: fix tests that use JSON client
- There was a bug in JSONClient when searching
  for resoruces. The id was not checked and
  the booleans were not set to true when
  optional search queries like resource_version
  and gem5_version are not passed.

Change-Id: I4aa7c5388035144ec6864d57130ad09e6709692e
2024-04-23 16:24:09 -07:00
Harshil Patel
97a0530452 stdlib: Enable bundled resource requests from the databases (#779) 2024-04-22 11:53:23 -07:00
Bobby R. Bruce
40fdf368d8 util: Enable m5term Apple Mac OS Compilation (#1046)
The "linux/limits.h" equivalent on Apple systems is "sys/syslimits.h".
By adding an include guard to include the correct header dependent on
the host system, we can compile m5term on Mac OS systems.
2024-04-22 11:31:16 -07:00
Bobby R. Bruce
dd2689905f misc,tests: Remove zip step from Workflows (#1048)
This is not needed with upload-artifact v4 directories are archived and
compressed by default.

This zip step was also causing Daily/Weekly test failures due to not
running `apt update` before the `apt install` for the zip utility. Ergo
this patch fixes these errors.
2024-04-21 09:15:20 -07:00
Matthew Poremba
c54039da5b configs: GPUFS: Turn off SSE4 and fancy XSAVEs (#1041)
A user reported a bug with the SSE4.1 version of memcmp in libc. When
enabled the simulated program crashes with SIGILL. After attempting all
fixes recommended by Intel SDM and still not working, turning the bit
off instead.

Similar, the default XSAVE functionality is not completely implemented
for AVX and newer ISA extensions. Therefore, there is not much point to
claiming to support the more advanced versions of XSAVE (XSAVEOPT,
XSAVEC, XSAVES, and XGETBV with ECX=1).

Note that none of these bits are enabled for non-GPU full system
simulations (see src/arch/x86/X86ISA.py). This only impacts GPUFS
simulations.

Change-Id: I8eb7bf0f2a0a29226095e7889fec9c1e8a65f88f
2024-04-20 11:04:59 -07:00
Bobby R. Bruce
e578f83739 github,tests: Add Pyunit tests to CI GitHub Action Workflow (#1026)
Due to an oversight, the PyUnit tests were not being run as part of the
gem5 CI tests. This was because they are located in "tests/pyunit"
instead of "tests/gem5", where the CI GitHub Action workflow searched
for tests to run and where all other tests reside.

This adds the Pyunit tests as a seperate job in the CI GitHub Action's
workflow.
2024-04-19 15:22:04 -07:00
Bobby R. Bruce
13f85b989f stdlib: Fix obtaining of Simpoint Resources
Change-Id: Ic73547c8c4acbe5d8a30a24dd8709cb2e9f6eb5e
2024-04-19 01:54:42 -07:00
Bobby R. Bruce
52a7218bd8 stdlib,tests: Fix test resources entry for to new schema
Change-Id: I77c263315d3e7f15df6f7fd83ab4ad9280faf777
2024-04-18 17:33:30 -07:00
Bobby R. Bruce
b80a04e146 stdlib,tests: Fix mocked_resquest_post - add kwargs
Change-Id: I1c080d42b6f238d2f716c500913dc7576dc13ed6
2024-04-18 17:33:30 -07:00
Bobby R. Bruce
e4ff5df35a tests,stdlib: Fix pyunit tests - Workload -> ShadowResource
Change-Id: I307439334c93851ebe3a78d3a80d048374a0900a
2024-04-18 17:33:30 -07:00
Bobby R. Bruce
29d56d3d65 misc,tests: Add Pyunit tests to CI GitHub Action Workflow
Due to an oversight, the PyUnit tests were not being run as part of the
gem5 CI tests. This was because they are located in "tests/pyunit"
instead of "tests/gem5", where the CI GitHub Action workflow searched
for tests to run and where all other tests reside.

This adds the Pyunit tests as a seperate job in the CI GitHub Action's
workflow.

Change-Id: I63d93571fde11c19bf3d281c034eddf4b455ae4e
2024-04-18 17:33:30 -07:00
Bobby R. Bruce
cbf0334762 misc: Fix jq install for testlib-quick-matrix (#1038) 2024-04-18 17:30:53 -07:00
Ivana Mitrovic
42ffa52907 mem-ruby: Implement no_alloc Far Atomics in CHI (#994)
This PR introduces a missing pice of far atomic implementation. This
pull request incorporates several changes:

- Enable 2-level and 4-level (and N-level) cache hierarchies, removing
Atomic_NoWait transactions
- Fix Unique Near policy implementation that raised abort
- Add support for alloc_on_atomic == False. Enables Far Atomics on
systems where the HNF does not allocate evicted lines at LLC (Like in
WriteUpdate).
2024-04-18 11:35:47 -07:00
Ivana Mitrovic
c44b8635ab arch-x86: Movfp account for dataSize=4 (#1024)
Movfp instruction did not account for only copying the lower half of src
register if dataSize is 4.
GitHub Issue: #893 
I used the test code in issue #893 to verify the fix is working.
2024-04-18 10:36:00 -07:00
Bartek Gąsiorzewski
84cba2a8a8 dev: Fix interrupt logic in uart8250 (#1009)
Hi, we've noticed some issues with the Uart8250 device when using it as
the Linux console. Sometimes the Uart interrupt would remain constantly
posted, so Linux would continue to try and handle it, effectively
resulting in an infinite loop. With this patch, I'm no longer seeing any
issues, but my testing has been limited to configurations and workloads
we're interested in at Imagination, so please let me know if there's
some other tests I should run or if you notice any other issues.

This patch fixes several issues with interrupt posting and clearing in
the uart8250 device.

The "status" member variable and the console interrupt should be kept in
sync. However, in one code path in readIir, the interrupt bit was being
cleared in the status variable but not in the platform controller.

Additionally, in some code paths, the interrupts would be cleared in the
status variable and in the interrupt controller, but a future interrupt
would remain scheduled, causing a spurious interrupt and setting a bit
in status to 1.

These issues can confuse the kernel and result in an ininite interrupt
handling loop.

Another issue is related to the fact that there are two interrupt causes
(TX and RX) and both of them can be valid at the same time. When one of
them becomes no longer valid, we should check the status of the other
one before clearing the interrupt.

This patch addresses the issues listed above and refactors the interrupt
clearing logic to reduce repetition.
2024-04-17 11:27:39 -07:00
Jason Lowe-Power
c13aa7727d cpu: Fix Ruby/x86 pio port connections (#1035)
Fixes #1033

In the BaseCPU object _uncached_interrupt_response_ports is a class
variable, not an instance variable. #1004 changed the explicit
self._uncached_interrupt_response_ports to use extend. This caused the
list of ports to be extended *for all cores*, which caused problems when
using a system with more than 1 core.

This reverts the `extend` part of the change, but keeps the rest.

Change-Id: I6dc7d6da6763048d82960229d34933a3a2ac36e0

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-04-17 08:20:04 -07:00
Yu-Cheng Chang
6b4dbdcedb tests,arch-riscv: update bitmanip asmtest binaries (#931)
Gem5 resource update: https://github.com/gem5/gem5-resources/pull/25
Gem5 issue: https://github.com/gem5/gem5/issues/883

Change-Id: I1892d7591d6fa49d0563623fd90292e0d38d9ba3
2024-04-16 09:51:32 -07:00
Lukas Zenick
01a5edc86e arch-x86: Use mbits function for clarity
Change-Id: I577ee55752f917e561e4c741ba7a19f0229318b5
2024-04-15 22:49:41 -05:00
Matthew Poremba
9b463dbdfd util-docker: Bump gpu-fs build docker to ROCm 6.0.2 (#1025)
This bumps the docker image used to build GPU applications for input to
GPUFS simulations from ROCm 5.4.2 to ROCm 6.0.2 and Ubuntu from 20.04 to
22.04. This matches the versions in gem5-resources#29 .

Several notes were added to the Dockerfile to describe where the RUN
commands come from. A README.md is also added to clarify that this is
not a disk image for GPUFS and is only used to build applications.

Change-Id: I9ada99e2ed1854cb7adb76f2a1fa662bab398f86
2024-04-15 13:36:06 -07:00
Bobby R. Bruce
1aa0bf8ec6 tests,github: Update CI Tests' GitHub Actions versions (#1021) 2024-04-15 13:35:33 -07:00
Bobby R. Bruce
56a2346b8d tests,util-docker,github: Add Ubuntu 24.04 Docker image & updated tests/actions to use it (#1018)
This ensures gem5 compiles and runs in 24.04 environments. A necessary
PR, for ensuring gem5 support Ubuntu 24.04 (related issue: #909)
2024-04-15 13:34:22 -07:00
Matthew Poremba
a03319bef7 arch-vega: Fix output warnings, gem5.fast (#1023)
Fix gem5.fast build not building when using gpu model.

Removes very spammy stat distribution bucket size prints when running
gpu model.
2024-04-15 13:18:27 -07:00
Matthew Poremba
7e2d8dee42 mem,gpu-compute: Implement GPU TCC directed invalidate (#1011)
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.

In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.

To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.

This fixes issues with stale data in nanoGPT and possibly PENNANT.
2024-04-15 13:18:01 -07:00