Commit Graph

2637 Commits

Author SHA1 Message Date
Pranith
50f652a2ee Implement BTB using the cache library (#1537)
This enables the BTB to be associative and use various replacement
policies.
2024-10-10 17:05:22 +01:00
Erin (Jianghua) Le
feeb3b2d67 cpu: fix simInsts and simOps not resetting (#1615)
This PR fixes the bug where simInsts and simOps don't reset when
m5.stats.reset() is called. The stats hostInstRate and hostOpRate are
affected by this change as well, as they depend on simInsts and simOps
respectively.

This is related to issue 1443 linked
[here](https://github.com/gem5/gem5/issues/1443).
2024-10-09 19:49:43 -07:00
Yu-Cheng Chang
402a030ce1 cpu,arch,arch-riscv: Check wake up signal when post interrupt (#1641)
The RISC-V doesn't not draft about how to handle wake up from interrupt
signal. In SiFive U74 core, the hart will wake up if there is any
enabled pending interrupt.

[1] Section 14.3.1
https://sifive.cdn.prismic.io/sifive/ad5577a0-9a00-45c9-a5d0-424a3d586060_u74_core_complex_manual_21G3.pdf
2024-10-08 08:51:38 -07:00
Matthew Poremba
4f7b3ed827 mem-ruby: Remove static methods from RubySystem (#1453)
There are several parts to this PR to work towards #1349 .

(1) Make RubySystem::getBlockSizeBytes non-static by providing ways to
access the block size or passing the block size explicitly to classes.

The main changes are:
 - DataBlocks must be explicitly allocated. A default ctor still exists
   to avoid needing to heavily modify SLICC. The size can be set using a
   realloc function, operator=, or copy ctor. This is handled completely
   transparently meaning no protocol or config changes are required.
 - WriteMask now requires block size to be set. This is also handled
   transparently by modifying the SLICC parser to identify WriteMask
   types and call setBlockSize().
 - AbstractCacheEntry and TBE classes now require block size to be set.
   This is handled transparently by modifying the SLICC parser to
   identify these classes and call initBlockSize() which calls
   setBlockSize() for any DataBlock or WriteMask.
 - All AbstractControllers now have a pointer to RubySystem. This is
   assigned in SLICC generated code and requires no changes to protocol
   or configs.
 - The Ruby Message class now requires block size in all constructors.
   This is added to the argument list automatically by the SLICC parser.
   
(2) Relax dependence on common functions in
src/mem/ruby/common/Address.hh
so that RubySystem::getBlockSizeBits is no longer static. Many classes
already have a way to get block size from the previous commit, so they
simply multiple by 8 to get the number of bits. For handling SLICC and
reducing the number of changes, define makeCacheLine, getOffset, etc. in
RubyPort and AbstractController. The only protocol changes required are
to change any "RubySystem::foo()" calls with "m_ruby_system->foo()".

For classes which do not have a way to get access to block size but
still used makeLineAddress, getOffset, etc., the block size must be
passed to that class. This requires some changes to the SimObject
interface for two commonly used classes: DirectoryMemory and
RubyPrefecther, resulting in user-facing API changes

User-facing API changes:
 - DirectoryMemory and RubyPrefetcher now require the cache line size as
   a non-optional argument.
 - RubySequencer SimObjects now require RubySystem as a non-optional
   argument.
 - TesterThread in the GPU ruby tester now requires the cache line size
   as a non-optional argument.

(3) Removes static member variables in RubySystem which control
randomization, cooldown, and warmup. These are mostly used by the Ruby
Network. The network classes are modified to take these former static
variables as parameters which are passed to the corresponding method
(e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object
at all.

Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220

(4) There are two major SLICC generated static methods:
getNumControllers()
on each cache controller which returns the number of controllers created
by the configs at run time and the functions which access this method,
which are MachineType_base_count and MachineType_base_number. These need
to be removed to create multiple RubySystem objects otherwise NetDest,
version value, and other objects are incorrect.

To remove the static requirement, MachineType_base_count and
MachineType_base_number are moved to RubySystem. Any class which needs
to call these methods must now have a pointer to a RubySystem. To enable
that, several changes are made:
 - RubyRequest and Message now require a RubySystem pointer in the
   constructor. The pointer is passed to fields in the Message class
   which require a RubySystem pointer (e.g., NetDest). SLICC is modified
   to do this automatically.
 - SLICC structures may now optionally take an "implicit constructor"
   which can be used to call a non-default constructor for locally
   defined variables (e.g., temporary variables within SLICC actions). A
   statement such as "NetDest bcast_dest;" in SLICC will implicitly
   append a call to the NetDest constructor taking RubySystem, for
   example.
 - RubySystem gets passed to Ruby network objects (Network, Topology).
2024-10-08 08:14:50 -07:00
Giacomo Travaglini
4a3e2633d2 cpu-o3: Add Matrix OpDesc to the O3 Default FU (#1640)
There was a bug exposed by a recent PR [1] where until recently the O3
CPU was executing an instruction even if it did not have the required
functional unit in the FU pool.

We are adding the matrix descriptors to the Default FU pool in the O3
cpu so that no panic is encountered upon executing of a matrix
instruction

[1]: https://github.com/gem5/gem5/pull/1516

Change-Id: I04250255a2cbb2ee6f3ef204b62bc2c1ee2d4d2c

Reviewed-by: Richard Cooper <richard.cooper@arm.com>

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-10-08 10:23:14 +01:00
Giacomo Travaglini
440999e447 cpu-o3: Add Crypto OpDesc to the O3 Default FU (#1639)
There was a bug exposed by a recent PR [1] where until recently the O3
CPU was executing an instruction even if it did not have the required
functional unit in the FU pool.

We are adding the crypto descriptors to the Default FU pool in the O3
cpu so that no panic is encountered upon executing of a crypto
instruction

[1]: https://github.com/gem5/gem5/pull/1516

Change-Id: Ifaf2f8e4780dfb8ba825a99a02dd587f011dbd23

Reviewed-by: Richard Cooper <richard.cooper@arm.com>

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-10-08 10:22:25 +01:00
aperais
e970acb9d2 cpu-o3: Replace integral constants by named constants in FU pool (#1556)
This replaces hardcoded integral values with more explicit constant
names in the code allocating functional units to instructions.

This commit follows ba5886aee7 which
should have read:

"If an instruction requires a functional unit that is not present in the
model (e.g., because it is not present in the configuration), O3CPU
treats it as a 1-cycle operation.

This commit changes the behavior to make the cpu panic when this
happens. The cpu panics only if the instruction reaches the head of the
ROB, meaning it is ok to have unsupported instructions on the wrong
path.

Thanks to Chandana S. Deshpande (deshpande.s.chandana@gmail.com) for
finding the issue."

Change-Id: I5e0a37e5fb8404cb5496bd2cb0a9a5baeae3b895

Co-authored-by: Arthur perais <arthur.perais@univ-grenoble-alpes.fr>
2024-09-12 14:04:34 +01:00
aperais
ba5886aee7 cpu-o3: Panic if no FU exists for an instruction needing to issue (#1516)
At present, if an instruction requires a functional unit that is not
present in the O3CPU config, O3CPU treats it as a 1-cycle operation that
does not consume an FU. This seems like a silent failure : if I forgot
to add a FU for a new operation type I added, then I don't want it to
silently work "for free".

The problem is that the code treats the FU allocator returning
`NoCapableFU` for a given DynInst as equivalent to the case where the
DynInst obtained an FU, with default latency of 1. This is because there
is a single if statement that checks whether the FU allocator returned
`NoFreeFU` or not, and `NoCapableFU` happens to be different. The change
is to introduce `NoNeedFU` and to panic if the FU allocator returns
`NoCapableFU`

An improvement would be to use a strongly typed enum rather than integer
constants. Thoughts ?

In addition to unit tests, I have tested this with `main.py run` and get
panics if I remove support for `IntMul` type in `O3CPU.py` in:

```
./SuiteUID-asm-riscv-rv32um-ps-mul-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mul-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv32um-ps-mulh-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulh-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv32um-ps-mulhsu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulhsu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv32um-ps-mulhu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulhu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mul-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mul-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mulh-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulh-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mulhsu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulhsu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mulhu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulhu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mulw-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulw-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-BaseCPUProcessor-arm-hello-ALL-x86_64-opt/TestUID-BaseCPUProcessor-arm-hello-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-cpu_test_ArmDerivO3CPU_Bubblesort-ALL-x86_64-opt/TestUID-cpu_test_ArmDerivO3CPU_Bubblesort-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-cpu_test_ArmDerivO3CPU_FloatMM-ALL-x86_64-opt/TestUID-cpu_test_ArmDerivO3CPU_FloatMM-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-cpu_test_RiscvDerivO3CPU_Bubblesort-ALL-x86_64-opt/TestUID-cpu_test_RiscvDerivO3CPU_Bubblesort-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-cpu_test_RiscvDerivO3CPU_FloatMM-ALL-x86_64-opt/TestUID-cpu_test_RiscvDerivO3CPU_FloatMM-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_arm_boot_test_to-tick-ALL-x86_64-opt/TestUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_arm_boot_test_to-tick-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_riscv-boot-test_to-tick-ALL-x86_64-opt/TestUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_riscv-boot-test_to-tick-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-arm-hello32-static-o3-ALL-x86_64-opt/TestUID-test-arm-hello32-static-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-arm-hello64-static-o3-ALL-x86_64-opt/TestUID-test-arm-hello64-static-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-mips-hello-o3-ALL-x86_64-opt/TestUID-test-mips-hello-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-riscv-hello-o3-ALL-x86_64-opt/TestUID-test-riscv-hello-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-riscv-print-this-o3-ALL-x86_64-opt/TestUID-test-riscv-print-this-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
```

Co-authored-by: Arthur perais <arthur.perais@univ-grenoble-alpes.fr>
2024-09-11 16:43:31 +01:00
MMysore2
33e3bc4ff1 Updating Traffic Generators (#1416)
Added documentation for `strided_generator.py` and
`strided_generator_core.py.`

Updated clarity of documentation for `linear_generator.py`,
`linear_generator_core.py`, `random_generator.py`, and
`random_generator_core.py`.

Made `max_addr` exclusive instead of inclusive for strided and linear
traffic generation in `strided_gen.cc` and `linear_gen.cc`.
2024-08-08 12:46:10 -07:00
Saili Karkare
bd228af5cf Updating hex addr printing (#1385)
This change changes the addresses that are printed when TrafficGen
DebugFlag is enabled. Previously, hex strings were printed without a
preceding 0x. This change fixes that to distinguish between decimal and
hex.
2024-08-07 02:31:21 -07:00
Yu-Cheng Chang
c13f895af0 arch,cpu: Implement generic reset method for MMU (#1342)
Implementing generic reset method for MMU allows each ISA implementing
their own reset methods. The default reset MMU method is flush all TLB
entries. For example, The RISC-V needs to do PMP reset when received the
reset signal, but the TLBs don't require to be flushed.

Change-Id: I158261570fb6e5216ec105fbdc53460f83f88d15
2024-07-30 09:47:55 +01:00
Yu-Cheng Chang
ce8db85867 cpu: Add cpuIdlePins to indicate the threadContext of CPU is idle (#1285)
If the threacContext of CPU enters the suspend mode, raise the threadID
of threadContext cpu_idle_pins with the high signal to target. If the
threadContext of CPU enters the activate mode, lower the threadID of
thread cpu_idle_pins with low signal to target.
2024-07-10 10:36:37 +01:00
Bobby R. Bruce
7137b73ca0 cpu: Fix std::min type mismatch in reg_class.hh (#1266)
Introduced in #1234, this caused compilation to faill in Apple Silicon
systems. This bug is the same as #582 where a more detailed explanation
is provided.
2024-06-20 13:02:08 -07:00
Mahyar Samani
7ff1e381c9 cpu,stdlib: Fix Access Trace for Accessing Indices in SpatterGen (#1258)
This change fixes the way indices are generated in a multi generator
setup.
It changes it from all cores generating the same trace of indices for
accessing the index array to each core generating an interleaved subset
of indices.
For an example look below for traces (indices to index array) in a 2
core setup.

Before:
core_0: 0, 1, 2, 3, 4, 5, 6, 7, ...
core_1: 0, 1, 2, 3, 4, 5, 6, 7, ...
After:
core_0: 0, 1, 2, 3, 8, 9, 10, 11, ...
core_1: 4, 5, 6, 7, 12, 13, 14, 15, ...

Additionally, this change fixes the SpatterKernel class in the standard
library to comply with the change in the SpatterGen source code.
2024-06-20 11:24:44 -07:00
Bobby R. Bruce
36f73f671d cpu,stdlib: Adding Spatter (#1136)
This PR adds source code for C++ implementation of SpatterGen as well as
SpatterKernel. SpatterGen uses a PyBindMethod to add kernels to the
backend code. This way the process of processing json files could be
offloaded to python. In addition it adds standard library components for
SpatterGenCore and SpatterGen. These two components follow the same
structure as AbstractCore and AbstractProcessor. In addition
spatter_kernel.py adds a definition for SpatterKernel in python to make
adding kernels to C++ easier. Also it adds utility functions for parsing
dictionaries read from json as well as partitioning traces for multicore
setups.
2024-06-17 15:28:45 -07:00
Hoa Nguyen
15e0236a8b arch,cpu,sim: Add mechanism to partially print vector regs (#1234)
Currently, gem5's inst tracer prints the whole vector register container
by default. The size of vector register containers in gem5 is the
maximum size allowed by the ISA. For vector-length agnostic (VLA) vector
registers, this means ARM SVE vector container is 2048 bits long, and
RISC-V vector container is 65535 bits long. Note that VLA implementation
in gem5 allows the vector length to be varied within the limit specified
by the ISAs.

However, in most use cases of gem5, the vector length is much less than
65535 bits. This causes two issues: (1) the vector container requires
allocating and moving around a large amount of unused data while only a
fraction of it is used, and (2) printing the execution trace of a vector
register results in a wall of text with a small amount of useful data.

This change addresses the problem (2) by providing a mechanism to limit
the amount data printed by the instruction tracer. This is done by
adding a function printing the first X bits of a vector register
container, where X is the vector length determined at runtime, as
opposed to the vector container size, which is determined at compilation
time.

Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7

---------

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 14:05:47 -07:00
Giacomo Travaglini
2804311f7b cpu-o3: Revert "Do not set Executed on load instruction to be replayed" (#1251)
Reverts gem5/gem5#1182

This is breaking O3 execution. Investigating the matter
2024-06-17 12:24:43 -07:00
Mahyar Samani
6695e5ef70 cpu: Adding SpatterGen
This change adds source code for SpatterGen ClockedObject.
The set of source code pushed includes code for SpatterKernel
that tracks whether information is being gathered or scattered
as well as the list of indices to be accessed. This model
has PyBindMethod to add SpatterKernels from python.
This way all the preparations for kernels can be done in python.
SpatterGen has a few parameters that model limits on a few of
hardware resources in the backend of a processor, e.g. number
of functional units to calculate effective address, the latency
of calculating effective address, number of integer registers.

Change-Id: I451ffb385180a914e884cab220928c5f1944b2e3
2024-06-14 10:45:09 -07:00
Minje Jun
b8e21a2d32 cpu-o3: Do not set Executed on load instruction to be replayed (#1182)
A load instruction can be replayed when
1) it's strictly ordered or
2) it falls into load-store forwarding mismatch.

Case 1 was considered in executeLoad function but the case 2 wasn't. It
causes the case-2 replayed load instruction to violate the assertion
condition "assert(!load_inst->isExecuted())" in LSQUnit::read. This
commit fixes the problem by adding consideration of the case 2 in
LSQUnit::executeLoad.

Co-authored-by: Minje Jun <minje.jun@samsung.com>
2024-06-14 10:12:26 -07:00
Jason Lowe-Power
21ffd91529 cpu,arch: Add IsInvalid flag to Unknown insts (#1071)
The IsInvalid flag indicates that the static instruction is not part of
the executing ISA and not part of m5's pseudo-instructions. This flag
provides a way to recognize an illegal instruction at the decode stage.
2024-06-13 16:26:35 -07:00
Harshil Patel
74afea471d cpu: Revert "Don't change to suspend if the thread status is halted" (#1225)
Reverts gem5/gem5#1039
2024-06-12 00:20:06 -07:00
Hoa Nguyen
369029d2be cpu: Add IsInvalid flag to StaticInstFlags
The IsInvalid flag indicates that the static instruction is not part
of the executing ISA and not part of m5's pseudo-instructions. This
flag provides a way to recognize an illegal instruction at the decode
stage.

Change-Id: I2779c6edcd8c5e6a77ea11cad3ff73bacb79d800
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-11 18:48:29 +00:00
Ivana Mitrovic
a764b9be1c Revert "arch-x86: Fix TLB Assertion Error on CFLUSH" (#1196)
Reverts gem5/gem5#1080 as it is not a good fix.
2024-06-04 10:26:53 -07:00
Lukas Zenick
dad5c7b6f7 arch-x86: Fix TLB Assertion Error on CFLUSH (#1080)
Fixed the assertion statement in the cpu's translation.hh file so that
it doesn't fail the assertion if the cache is clean.

I compile this c code to `test`
```c
#include <stdio.h>

static inline void clflush(volatile void *p) {
    __asm__ volatile ("clflush (%0)" : : "r"(p) : "memory");
}

int main() {
    int data = 42;  // Example variable

    printf("Value before clflush: %d\n", data);

    clflush(&data);

    printf("Value after clflush: %d\n", data);

    return 0;
}
```
And run it with this script
`./build/X86/gem5.opt configs/learning_gem5/part1/two_level.py ./test`
In order to verify that it no longer fails the assertion check.

GitHub Issue: #862 
Change-Id: I6004662e7c99f637ba0ddb07d205d1657708e99f
2024-06-03 10:17:10 -07:00
Harshil Patel
0824d7f2cd Revert "cpu-kvm: Support perf counters on hybrid host architectures" (#1127)
Reverts gem5/gem5#1065

Reverting this change because this PR breaks X86 kvm as mentioned in the
issue #1126.
2024-05-21 08:14:10 -07:00
Yu-Cheng Chang
321bd07163 cpu: Don't change to suspend if the thread status is halted (#1039)
In our gem5 model, there are four types represent thread context:
Active, Suspend, Halting and Halted


5641c5e464/src/cpu/thread_context.hh (L99-L117)

When initializing the gem5 instance, all of the thread contexts are set
Halted. The status of thread context will not be active until the
Workload initializes start up, except the StubWorkload. So if the user
uses the StubWorkload, and the CPU is connected with the model_reset
port. The thread context of the CPU will be activated possibly.

The following is the steps of activating thread context of the CPU
without Workload[1] initialization or lower model_reset port[2].

1. Raise the model_reset port (Change the state from Halted to Suspend)
5641c5e464/src/cpu/base.cc (L671-L673)

2. Post the interrupt to CPU (Change the state from Suspend to Active)
5641c5e464/src/cpu/base.cc (L231-L239)

Implementation of wakeup

SimpleCPU:

5641c5e464/src/cpu/simple/base.cc (L251-L259)

MinorCPU:

5641c5e464/src/cpu/minor/cpu.cc (L143-L151)

O3CPU:

5641c5e464/src/cpu/o3/cpu.cc (L1337-L1346)

This CL fixed the issue when raising the model reset port to CPU(let CPU
sleep) if the CPU is not activated by workload. If the CPU status is
halted, it's should not change to Suspend to avoid wake up

Reference

The model_reset is introduced in the CL:
https://gem5-review.googlesource.com/c/public/gem5/+/67574/4

[1] Activate by workload (ARM example):

5641c5e464/src/arch/arm/fs_workload.cc (L101-L114)

[2] Lower the model_reset:

5641c5e464/src/cpu/base.cc (L191-L192)
5641c5e464/src/cpu/base.cc (L674-L685)

Change-Id: I5bfc0b7491d14369fff77b98b71c0ac763fb7c42
2024-05-16 10:02:53 -07:00
OdnetninI (Eduardo José Gómez Hernández)
17cbbd84ae cpu: Indirect predictor track conditional indirect (#1077)
As discussed in https://github.com/orgs/gem5/discussions/954: 

In the refactor made by commit f65df9b959 conditional indirect
branches are no longer updated in the indirect predictor.
This kind of branches do not exist in x86 neither arm, but they are
present in PowerPC.

This patch, enables the indirect predictor to track this kind of
branches.
2024-04-29 11:38:22 +01:00
Nicholas Mosier
c679c9c127 cpu-o3: prioritize exiting threads when committing (#1056)
Fix #1055. Prioritize committing from exiting threads before we consider
other threads using the specified SMT commit policy. All instructions in
the ROB for exiting threads should already have been squashed. Thus,
this ensures that the ROB instruction queues for all exiting threads
will be empty at the end of the current cycle, avoiding the assertion
failure encountered in #1055.

Change-Id: Ib0178a1aa6e94bce2b6c49dd87750e82776639dc
2024-04-25 11:15:14 -07:00
Nicholas Mosier
51d546cb06 cpu-o3: Clear current macro-op in fetch if squashing after last micro-op (#1047)
Fix #1042. Clear the current fetch macro-op if the instruction
initiating the squash is the last micro-op in its macro-op.

Change-Id: I77f60334771277e47f19573d4067b3a7bc5488b2
2024-04-25 11:14:58 -07:00
Nicholas Mosier
cf5ec880c9 cpu-kvm: Support overflows when migrating across hybrid cores
Add support for event overflows when the host thread migrates across
differnt types of cores on a hybrid host architecture. This patch
achieves this by simply halving the sample period for each performance
counter. Since there are two types of cores, this guarantees that an
overflow event will trigger before N events occur, where N is the
requested period (e.g., number of instructions to simulate). This
may result in many early triggers (up to log2(N)) before the requested
period is reached. However, gem5's existing bookkeeping logic already
handles this case properly: if fewer events than requested occurred,
it will set a new period (N - observed) and resume execution. This loop
will exit once N events have actually occurred.

Change-Id: Iff85237da1ae1aa25bc2045fbf9091726291fe36
2024-04-24 09:47:46 -07:00
Nicholas Mosier
30ea15009f cpu-kvm: Support perf counters on hybrid host architectures
Fix #1064 by adding support for hardware performance counters on hybrid
architectures like Intel Alder Lake.

Hybrid architectures have multiple types of cores, each of which require
the instantiation of a separate performance counter. The KVM CPU's
PerfKvmCounter class was not aware of this, any only instantiated a
single performance counter, implicitly bound to the P-core only. This
meant that if gem5 ever ran on an E-core, the various hardware
performance counters would not get updated properly, in some cases
always zero (e.g., for the number of instructions executed).

This patch adds support for hybrid host architectures as follows. First,
we convert PerfKvmCounter into an abstract class, which has two
concrete implementations: SimplePerfKvmCounter and HybridPerfKvmCounter.
The former is used for non-hybrid architectures or for non-hardware
performance counters and is functionally equivalent to the prior
implementation of PerfKvmCounter. The latter is used for instantiating
hardware performance counters (i.e., of type PERF_TYPE_HARDWARE) on
hybrid host architectures. It does so by internally instantiating two
SimplePerfKvmCounters, one for a P-core and one for an E-core. Upon
read, it sums the results of reading the two internal counters.

Change-Id: If64fcb0e2fcc1b3a6a37d77455c2b21e1fc81150
2024-04-24 09:47:46 -07:00
Jason Lowe-Power
c13aa7727d cpu: Fix Ruby/x86 pio port connections (#1035)
Fixes #1033

In the BaseCPU object _uncached_interrupt_response_ports is a class
variable, not an instance variable. #1004 changed the explicit
self._uncached_interrupt_response_ports to use extend. This caused the
list of ports to be extended *for all cores*, which caused problems when
using a system with more than 1 core.

This reverts the `extend` part of the change, but keeps the rest.

Change-Id: I6dc7d6da6763048d82960229d34933a3a2ac36e0

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-04-17 08:20:04 -07:00
Yu-Cheng Chang
ebb70dea99 cpu: Fix KVM false negative warning after Kconfig transition (#1013)
When we start to build gem5. We will read and process all of SConsopts
files, and process the after_sconsopts_callbacks after all of SConsopts
files read.

In the KVM_ISA env setting, the KVM_ISA env can be set in the different
files, take x86 and arm as example:

KVM_ISA default value:

bc39283451/src/cpu/kvm/SConsopts

x86 KVM_ISA:

bc39283451/src/arch/x86/kvm/SConsopts (L39-L45)

arm KVM_ISA:

bc39283451/src/arch/arm/kvm/SConsopts (L35-L36)

We should move the kvm warning after all of SConsopts env read

issue: https://github.com/gem5/gem5/issues/686

Change-Id: I096c6bebaaec18f9b2af93191d0dd23c65084eda
2024-04-12 09:23:56 -07:00
Nicholas Mosier
bc39283451 cpu-o3, arch-x86: initialize interrupts for all SMT threads (#1007)
Fix issue #1004. When enabling SMT with the O3 cpu, only the first
interrupts object was getting initialized properly. This patch
initializes all interrupts objects, one per SMT thread.

Change-Id: I300782b645bd8ea3ef2497278fb73125ab4bf495
2024-04-11 11:17:24 -07:00
Ivan Fernandez
c91d1253de cpu: This commit updates cpu FUs according to new Simd types
This commit updates cpu by removing VectorXXX types and updates
    FUs according to the newer SimdXXX ones. This is part of the
    homogenization of RISCV Vector instruction types, which moved
    from VectorXXX to SimdXXX.

Change-Id: I84baccd099b73a11cf26dd714487a9f272671d3d
2024-03-25 19:01:47 +01:00
Ivan Fernandez
1e743fd85a arch-riscv: adding vector unit-stride segment stores to RISC-V (#913)
This commit adds support for vector unit-stride segment store operations
for RISC-V (vssegXeXX). This implementation is based in two types of
microops:
- VsSegIntrlv microops that properly interleave source registers into
structs.
- VsSeg microops that store data in memory as contiguous structs of
several fields.

Change-Id: Id80dd4e781743a60eb76c18b6a28061f8e9f723d

Gem5 issue: https://github.com/gem5/gem5/issues/382
2024-03-22 15:45:58 -07:00
Ivan Fernandez
f6c61836b3 arch-riscv: adding vector unit-stride segment loads to RISC-V (#851)
This commit adds support for vector unit-stride segment load operations
for RISC-V (vlseg<NF>e<X>). This implementation is based in two types of
microops:
- VlSeg microops that load data as it is organized in memory in structs
of several fields.
- VectorDeIntrlv microops that properly deinterleave structs into
destination registers.

Gem5 issue: https://github.com/gem5/gem5/issues/382
2024-03-06 11:27:06 -08:00
Giacomo Travaglini
8759131df3 cpu-o3, arch: Fix SMT bug arising from v23.0 and make gem5 more robust with SMT (#828)
This PR is fixing https://github.com/gem5/gem5/issues/668. It fixes it
for all ISAs other than Arm with the first commit, which is setting the
number of architectural Matrix registers to 0 for those ISA which are
not using them.

It then partly fixes it for Arm as well with the 2nd commit: by removing
RenameMap::numFreeEntries we don't stall renaming unless a matrix
instruction is encountered... This means most binaries will run with SMT
as long as they don't use FEAT_SME instructions. Please note: this is
not simply a SMT fix, it will generally address a shortcoming in the way
we were renaming instructions.

If an Arm binary wants to use SMT with FEAT_SME, the 4th commit will
make sure the lack of physical registers is notified explicitly at the
beginning of simulation, rather than silently blocking renaming
2024-02-19 08:52:31 +00:00
Arnabjyoti Kalita
b826d96f40 cpu-o3: add PerThreadUnifiedThreadMap to O3 CPU (#842)
Github issue: https://github.com/gem5/gem5/issues/373

Change-Id: I1c8aba9bc5ea4e45faa6c174780904b8bd618604
2024-02-12 09:26:31 -08:00
Giacomo Travaglini
4eb0cd44fc cpu-o3: Restrict constraint on number of physical registers
Having the number of physical registers matching exactly the number of
architectural ones does not guarantee a proper execution as it means the
freeList would have 0 registers available for renaming. In this case the
worst would happen: renaming would silently stall execution
indefinitely.  With this change we report the issue to the user and fail
execution

Change-Id: I1eb968802f1a1a5115012f44b541542a682f887d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-02-02 21:18:32 +00:00
Giacomo Travaglini
1fb7c1ad7e cpu-o3: Rename numFreeEntries into minFreeEntries
Change-Id: I89faeb001ebdcbc90ea88508f8d231ec6e7fe197
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-02-02 18:11:47 +00:00
Giacomo Travaglini
86158de220 cpu-o3: Stop using RenameMap::numFreeEntries
The method is extracting the minimum number of [1] non-zero free
registers/entries across all register classes.  This means that if we
have saturated all register storage for a particular class, renaming
will stop as a whole.

I believe it does make sense to keep renaming and only block renaming in
case an instruction requiring the particular register type is
encountered. This would happen with the Rename::renameInsts method

[1]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename_map.hh#L269
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L662

Change-Id: I932826a77a5c0b2e05d8fdcab0e6ca13cf0e3d23
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-02-02 18:11:47 +00:00
Mahyar Samani
b79fe82e5c cpu,stdlib: Updating strided generator (#762)
This change improves the functionality of strided generator to create
trace with better flexibility.
It allows the user to manually set offset and stride size instead of
calculating it based on a "gen_id".
This way different patterns could be created with the same SimObject.
In addition, this change adds stdlib components for strided generator.
2024-02-01 09:08:42 -08:00
Matthew Poremba
63caa780c2 misc: Remove all references to GCN3
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.

Using misc since there is not enough room to fit all the tags.

Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
2024-01-17 11:11:06 -06:00
Bobby R. Bruce
213d0b0bfe cpu: 'suppressFuncErrors' -> 'pkt->suppressFuncError()' fix
Change-Id: If4aa71e9f6332df2a3daa51b69eaad97f6603f6b
2023-12-20 09:15:15 -08:00
Hoa Nguyen
7a5052b3a0 arch-arm: Only build ArmCapstoneDisassembler when ISA is arm (#553)
Currently, if the Capstone header file is found in the host system,
scons will try to build the ArmCapstoneDisassembler regardless of the
gem5 target ISA. This is causing problem when the host has Capstone, but
the gem5 target ISA is not arm. Compiling gem5 in this case will cause
errors, e.g., ArmISA and ArmSystem is not found.

This change aims to prevent building the ArmCapstoneDisassembler when
the gem5 target ISA is not arm.

Ref:
[1] The Arm Capstone PR https://github.com/gem5/gem5/pull/494

Change-Id: I1e714d34aec8fe2a2af8cd351536951053a4d8a5
2023-12-03 13:22:11 -08:00
Richard Cooper
2fbbdad618 base: Add encapsulation to the loader::Symbol class
This commit converts `gem5::loader::Symbol` to a full class with
private members, enforcing encapsulation. Until now client code has
been able to (and does) access members directly.

This change will enable class invariants to be enforced via accessor
methods.

Change-Id: Ia0b5b080d4f656637a211808e13dce1ddca74541
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2023-12-01 22:00:26 +00:00
Andreas Sandberg
dcdebec0f6 misc,python: Add isort hook to pre-commit (#431) 2023-11-30 09:54:12 +00:00
Bobby R. Bruce
d11c40dcac misc: Run pre-commit run --all-files
This ensures `isort` is applied to all files in the repo.

Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9
2023-11-29 22:06:41 -08:00
Adrià Armejach
eb13b32314 cpu-o3: Fix discarded requests str-ld forwarding (#614)
With the use of large RVV vectors (i.e., 8K or 16K bits) and a limited
number of cacheLoadPorts, some loads take multiple cycles to execute.
This triggered certain conditions when store-to-load forwarding happens
in the middle of the execution of a load that already has outstanding
packets.

First, after store-to-load forwarding the request is marked as discarded
and the load is immediately writtenback, which triggers a writebackDone
that tries to delete the request, triggering an assert as it still has
outstanding packets. This patch avoid deleting the request leaving it
self owned, it will be deleted when the last packet arrives in
packetReplied.

Second, this patch avoid checking snoops on discarded requests by
checking if the request exists.

Change-Id: Icea0add0327929d3a6af7e6dd0af9945cb0d0970

Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
2023-11-29 08:45:03 -08:00