derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Pranith	50f652a2ee	Implement BTB using the cache library (#1537 ) This enables the BTB to be associative and use various replacement policies.	2024-10-10 17:05:22 +01:00
Erin (Jianghua) Le	feeb3b2d67	cpu: fix simInsts and simOps not resetting (#1615 ) This PR fixes the bug where simInsts and simOps don't reset when m5.stats.reset() is called. The stats hostInstRate and hostOpRate are affected by this change as well, as they depend on simInsts and simOps respectively. This is related to issue 1443 linked [here](https://github.com/gem5/gem5/issues/1443).	2024-10-09 19:49:43 -07:00
Yu-Cheng Chang	402a030ce1	cpu,arch,arch-riscv: Check wake up signal when post interrupt (#1641 ) The RISC-V doesn't not draft about how to handle wake up from interrupt signal. In SiFive U74 core, the hart will wake up if there is any enabled pending interrupt. [1] Section 14.3.1 https://sifive.cdn.prismic.io/sifive/ad5577a0-9a00-45c9-a5d0-424a3d586060_u74_core_complex_manual_21G3.pdf	2024-10-08 08:51:38 -07:00
Matthew Poremba	4f7b3ed827	mem-ruby: Remove static methods from RubySystem (#1453 ) There are several parts to this PR to work towards #1349 . (1) Make RubySystem::getBlockSizeBytes non-static by providing ways to access the block size or passing the block size explicitly to classes. The main changes are: - DataBlocks must be explicitly allocated. A default ctor still exists to avoid needing to heavily modify SLICC. The size can be set using a realloc function, operator=, or copy ctor. This is handled completely transparently meaning no protocol or config changes are required. - WriteMask now requires block size to be set. This is also handled transparently by modifying the SLICC parser to identify WriteMask types and call setBlockSize(). - AbstractCacheEntry and TBE classes now require block size to be set. This is handled transparently by modifying the SLICC parser to identify these classes and call initBlockSize() which calls setBlockSize() for any DataBlock or WriteMask. - All AbstractControllers now have a pointer to RubySystem. This is assigned in SLICC generated code and requires no changes to protocol or configs. - The Ruby Message class now requires block size in all constructors. This is added to the argument list automatically by the SLICC parser. (2) Relax dependence on common functions in src/mem/ruby/common/Address.hh so that RubySystem::getBlockSizeBits is no longer static. Many classes already have a way to get block size from the previous commit, so they simply multiple by 8 to get the number of bits. For handling SLICC and reducing the number of changes, define makeCacheLine, getOffset, etc. in RubyPort and AbstractController. The only protocol changes required are to change any "RubySystem::foo()" calls with "m_ruby_system->foo()". For classes which do not have a way to get access to block size but still used makeLineAddress, getOffset, etc., the block size must be passed to that class. This requires some changes to the SimObject interface for two commonly used classes: DirectoryMemory and RubyPrefecther, resulting in user-facing API changes User-facing API changes: - DirectoryMemory and RubyPrefetcher now require the cache line size as a non-optional argument. - RubySequencer SimObjects now require RubySystem as a non-optional argument. - TesterThread in the GPU ruby tester now requires the cache line size as a non-optional argument. (3) Removes static member variables in RubySystem which control randomization, cooldown, and warmup. These are mostly used by the Ruby Network. The network classes are modified to take these former static variables as parameters which are passed to the corresponding method (e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object at all. Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220 (4) There are two major SLICC generated static methods: getNumControllers() on each cache controller which returns the number of controllers created by the configs at run time and the functions which access this method, which are MachineType_base_count and MachineType_base_number. These need to be removed to create multiple RubySystem objects otherwise NetDest, version value, and other objects are incorrect. To remove the static requirement, MachineType_base_count and MachineType_base_number are moved to RubySystem. Any class which needs to call these methods must now have a pointer to a RubySystem. To enable that, several changes are made: - RubyRequest and Message now require a RubySystem pointer in the constructor. The pointer is passed to fields in the Message class which require a RubySystem pointer (e.g., NetDest). SLICC is modified to do this automatically. - SLICC structures may now optionally take an "implicit constructor" which can be used to call a non-default constructor for locally defined variables (e.g., temporary variables within SLICC actions). A statement such as "NetDest bcast_dest;" in SLICC will implicitly append a call to the NetDest constructor taking RubySystem, for example. - RubySystem gets passed to Ruby network objects (Network, Topology).	2024-10-08 08:14:50 -07:00
Giacomo Travaglini	4a3e2633d2	cpu-o3: Add Matrix OpDesc to the O3 Default FU (#1640 ) There was a bug exposed by a recent PR [1] where until recently the O3 CPU was executing an instruction even if it did not have the required functional unit in the FU pool. We are adding the matrix descriptors to the Default FU pool in the O3 cpu so that no panic is encountered upon executing of a matrix instruction [1]: https://github.com/gem5/gem5/pull/1516 Change-Id: I04250255a2cbb2ee6f3ef204b62bc2c1ee2d4d2c Reviewed-by: Richard Cooper <richard.cooper@arm.com> Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-10-08 10:23:14 +01:00
Giacomo Travaglini	440999e447	cpu-o3: Add Crypto OpDesc to the O3 Default FU (#1639 ) There was a bug exposed by a recent PR [1] where until recently the O3 CPU was executing an instruction even if it did not have the required functional unit in the FU pool. We are adding the crypto descriptors to the Default FU pool in the O3 cpu so that no panic is encountered upon executing of a crypto instruction [1]: https://github.com/gem5/gem5/pull/1516 Change-Id: Ifaf2f8e4780dfb8ba825a99a02dd587f011dbd23 Reviewed-by: Richard Cooper <richard.cooper@arm.com> Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-10-08 10:22:25 +01:00
aperais	e970acb9d2	cpu-o3: Replace integral constants by named constants in FU pool (#1556 ) This replaces hardcoded integral values with more explicit constant names in the code allocating functional units to instructions. This commit follows `ba5886aee7` which should have read: "If an instruction requires a functional unit that is not present in the model (e.g., because it is not present in the configuration), O3CPU treats it as a 1-cycle operation. This commit changes the behavior to make the cpu panic when this happens. The cpu panics only if the instruction reaches the head of the ROB, meaning it is ok to have unsupported instructions on the wrong path. Thanks to Chandana S. Deshpande (deshpande.s.chandana@gmail.com) for finding the issue." Change-Id: I5e0a37e5fb8404cb5496bd2cb0a9a5baeae3b895 Co-authored-by: Arthur perais <arthur.perais@univ-grenoble-alpes.fr>	2024-09-12 14:04:34 +01:00
aperais	ba5886aee7	cpu-o3: Panic if no FU exists for an instruction needing to issue (#1516 ) At present, if an instruction requires a functional unit that is not present in the O3CPU config, O3CPU treats it as a 1-cycle operation that does not consume an FU. This seems like a silent failure : if I forgot to add a FU for a new operation type I added, then I don't want it to silently work "for free". The problem is that the code treats the FU allocator returning `NoCapableFU` for a given DynInst as equivalent to the case where the DynInst obtained an FU, with default latency of 1. This is because there is a single if statement that checks whether the FU allocator returned `NoFreeFU` or not, and `NoCapableFU` happens to be different. The change is to introduce `NoNeedFU` and to panic if the FU allocator returns `NoCapableFU` An improvement would be to use a strongly typed enum rather than integer constants. Thoughts ? In addition to unit tests, I have tested this with `main.py run` and get panics if I remove support for `IntMul` type in `O3CPU.py` in: ``` ./SuiteUID-asm-riscv-rv32um-ps-mul-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mul-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-asm-riscv-rv32um-ps-mulh-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulh-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-asm-riscv-rv32um-ps-mulhsu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulhsu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-asm-riscv-rv32um-ps-mulhu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulhu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-asm-riscv-rv64um-ps-mul-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mul-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-asm-riscv-rv64um-ps-mulh-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulh-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-asm-riscv-rv64um-ps-mulhsu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulhsu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-asm-riscv-rv64um-ps-mulhu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulhu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-asm-riscv-rv64um-ps-mulw-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulw-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-BaseCPUProcessor-arm-hello-ALL-x86_64-opt/TestUID-BaseCPUProcessor-arm-hello-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-cpu_test_ArmDerivO3CPU_Bubblesort-ALL-x86_64-opt/TestUID-cpu_test_ArmDerivO3CPU_Bubblesort-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-cpu_test_ArmDerivO3CPU_FloatMM-ALL-x86_64-opt/TestUID-cpu_test_ArmDerivO3CPU_FloatMM-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-cpu_test_RiscvDerivO3CPU_Bubblesort-ALL-x86_64-opt/TestUID-cpu_test_RiscvDerivO3CPU_Bubblesort-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-cpu_test_RiscvDerivO3CPU_FloatMM-ALL-x86_64-opt/TestUID-cpu_test_RiscvDerivO3CPU_FloatMM-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_arm_boot_test_to-tick-ALL-x86_64-opt/TestUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_arm_boot_test_to-tick-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_riscv-boot-test_to-tick-ALL-x86_64-opt/TestUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_riscv-boot-test_to-tick-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-test-arm-hello32-static-o3-ALL-x86_64-opt/TestUID-test-arm-hello32-static-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-test-arm-hello64-static-o3-ALL-x86_64-opt/TestUID-test-arm-hello64-static-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-test-mips-hello-o3-ALL-x86_64-opt/TestUID-test-mips-hello-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-test-riscv-hello-o3-ALL-x86_64-opt/TestUID-test-riscv-hello-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ./SuiteUID-test-riscv-print-this-o3-ALL-x86_64-opt/TestUID-test-riscv-print-this-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2 ``` Co-authored-by: Arthur perais <arthur.perais@univ-grenoble-alpes.fr>	2024-09-11 16:43:31 +01:00
MMysore2	33e3bc4ff1	Updating Traffic Generators (#1416 ) Added documentation for `strided_generator.py` and `strided_generator_core.py.` Updated clarity of documentation for `linear_generator.py`, `linear_generator_core.py`, `random_generator.py`, and `random_generator_core.py`. Made `max_addr` exclusive instead of inclusive for strided and linear traffic generation in `strided_gen.cc` and `linear_gen.cc`.	2024-08-08 12:46:10 -07:00
Saili Karkare	bd228af5cf	Updating hex addr printing (#1385 ) This change changes the addresses that are printed when TrafficGen DebugFlag is enabled. Previously, hex strings were printed without a preceding 0x. This change fixes that to distinguish between decimal and hex.	2024-08-07 02:31:21 -07:00
Yu-Cheng Chang	c13f895af0	arch,cpu: Implement generic reset method for MMU (#1342 ) Implementing generic reset method for MMU allows each ISA implementing their own reset methods. The default reset MMU method is flush all TLB entries. For example, The RISC-V needs to do PMP reset when received the reset signal, but the TLBs don't require to be flushed. Change-Id: I158261570fb6e5216ec105fbdc53460f83f88d15	2024-07-30 09:47:55 +01:00
Yu-Cheng Chang	ce8db85867	cpu: Add cpuIdlePins to indicate the threadContext of CPU is idle (#1285 ) If the threacContext of CPU enters the suspend mode, raise the threadID of threadContext cpu_idle_pins with the high signal to target. If the threadContext of CPU enters the activate mode, lower the threadID of thread cpu_idle_pins with low signal to target.	2024-07-10 10:36:37 +01:00
Bobby R. Bruce	7137b73ca0	cpu: Fix `std::min` type mismatch in reg_class.hh (#1266 ) Introduced in #1234, this caused compilation to faill in Apple Silicon systems. This bug is the same as #582 where a more detailed explanation is provided.	2024-06-20 13:02:08 -07:00
Mahyar Samani	7ff1e381c9	cpu,stdlib: Fix Access Trace for Accessing Indices in SpatterGen (#1258 ) This change fixes the way indices are generated in a multi generator setup. It changes it from all cores generating the same trace of indices for accessing the index array to each core generating an interleaved subset of indices. For an example look below for traces (indices to index array) in a 2 core setup. Before: core_0: 0, 1, 2, 3, 4, 5, 6, 7, ... core_1: 0, 1, 2, 3, 4, 5, 6, 7, ... After: core_0: 0, 1, 2, 3, 8, 9, 10, 11, ... core_1: 4, 5, 6, 7, 12, 13, 14, 15, ... Additionally, this change fixes the SpatterKernel class in the standard library to comply with the change in the SpatterGen source code.	2024-06-20 11:24:44 -07:00
Bobby R. Bruce	36f73f671d	cpu,stdlib: Adding Spatter (#1136 ) This PR adds source code for C++ implementation of SpatterGen as well as SpatterKernel. SpatterGen uses a PyBindMethod to add kernels to the backend code. This way the process of processing json files could be offloaded to python. In addition it adds standard library components for SpatterGenCore and SpatterGen. These two components follow the same structure as AbstractCore and AbstractProcessor. In addition spatter_kernel.py adds a definition for SpatterKernel in python to make adding kernels to C++ easier. Also it adds utility functions for parsing dictionaries read from json as well as partitioning traces for multicore setups.	2024-06-17 15:28:45 -07:00
Hoa Nguyen	15e0236a8b	arch,cpu,sim: Add mechanism to partially print vector regs (#1234 ) Currently, gem5's inst tracer prints the whole vector register container by default. The size of vector register containers in gem5 is the maximum size allowed by the ISA. For vector-length agnostic (VLA) vector registers, this means ARM SVE vector container is 2048 bits long, and RISC-V vector container is 65535 bits long. Note that VLA implementation in gem5 allows the vector length to be varied within the limit specified by the ISAs. However, in most use cases of gem5, the vector length is much less than 65535 bits. This causes two issues: (1) the vector container requires allocating and moving around a large amount of unused data while only a fraction of it is used, and (2) printing the execution trace of a vector register results in a wall of text with a small amount of useful data. This change addresses the problem (2) by providing a mechanism to limit the amount data printed by the instruction tracer. This is done by adding a function printing the first X bits of a vector register container, where X is the vector length determined at runtime, as opposed to the vector container size, which is determined at compilation time. Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7 --------- Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-17 14:05:47 -07:00
Giacomo Travaglini	2804311f7b	cpu-o3: Revert "Do not set Executed on load instruction to be replayed" (#1251 ) Reverts gem5/gem5#1182 This is breaking O3 execution. Investigating the matter	2024-06-17 12:24:43 -07:00
Mahyar Samani	6695e5ef70	cpu: Adding SpatterGen This change adds source code for SpatterGen ClockedObject. The set of source code pushed includes code for SpatterKernel that tracks whether information is being gathered or scattered as well as the list of indices to be accessed. This model has PyBindMethod to add SpatterKernels from python. This way all the preparations for kernels can be done in python. SpatterGen has a few parameters that model limits on a few of hardware resources in the backend of a processor, e.g. number of functional units to calculate effective address, the latency of calculating effective address, number of integer registers. Change-Id: I451ffb385180a914e884cab220928c5f1944b2e3	2024-06-14 10:45:09 -07:00
Minje Jun	b8e21a2d32	cpu-o3: Do not set Executed on load instruction to be replayed (#1182 ) A load instruction can be replayed when 1) it's strictly ordered or 2) it falls into load-store forwarding mismatch. Case 1 was considered in executeLoad function but the case 2 wasn't. It causes the case-2 replayed load instruction to violate the assertion condition "assert(!load_inst->isExecuted())" in LSQUnit::read. This commit fixes the problem by adding consideration of the case 2 in LSQUnit::executeLoad. Co-authored-by: Minje Jun <minje.jun@samsung.com>	2024-06-14 10:12:26 -07:00
Jason Lowe-Power	21ffd91529	cpu,arch: Add IsInvalid flag to Unknown insts (#1071 ) The IsInvalid flag indicates that the static instruction is not part of the executing ISA and not part of m5's pseudo-instructions. This flag provides a way to recognize an illegal instruction at the decode stage.	2024-06-13 16:26:35 -07:00
Harshil Patel	74afea471d	cpu: Revert "Don't change to suspend if the thread status is halted" (#1225 ) Reverts gem5/gem5#1039	2024-06-12 00:20:06 -07:00
Hoa Nguyen	369029d2be	cpu: Add IsInvalid flag to StaticInstFlags The IsInvalid flag indicates that the static instruction is not part of the executing ISA and not part of m5's pseudo-instructions. This flag provides a way to recognize an illegal instruction at the decode stage. Change-Id: I2779c6edcd8c5e6a77ea11cad3ff73bacb79d800 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-11 18:48:29 +00:00
Ivana Mitrovic	a764b9be1c	Revert "arch-x86: Fix TLB Assertion Error on CFLUSH" (#1196 ) Reverts gem5/gem5#1080 as it is not a good fix.	2024-06-04 10:26:53 -07:00
Lukas Zenick	dad5c7b6f7	arch-x86: Fix TLB Assertion Error on CFLUSH (#1080 ) Fixed the assertion statement in the cpu's translation.hh file so that it doesn't fail the assertion if the cache is clean. I compile this c code to `test` ```c #include <stdio.h> static inline void clflush(volatile void *p) { __asm__ volatile ("clflush (%0)" : : "r"(p) : "memory"); } int main() { int data = 42; // Example variable printf("Value before clflush: %d\n", data); clflush(&data); printf("Value after clflush: %d\n", data); return 0; } ``` And run it with this script `./build/X86/gem5.opt configs/learning_gem5/part1/two_level.py ./test` In order to verify that it no longer fails the assertion check. GitHub Issue: #862 Change-Id: I6004662e7c99f637ba0ddb07d205d1657708e99f	2024-06-03 10:17:10 -07:00
Harshil Patel	0824d7f2cd	Revert "cpu-kvm: Support perf counters on hybrid host architectures" (#1127 ) Reverts gem5/gem5#1065 Reverting this change because this PR breaks X86 kvm as mentioned in the issue #1126.	2024-05-21 08:14:10 -07:00
Yu-Cheng Chang	321bd07163	cpu: Don't change to suspend if the thread status is halted (#1039 ) In our gem5 model, there are four types represent thread context: Active, Suspend, Halting and Halted `5641c5e464/src/cpu/thread_context.hh (L99-L117)` When initializing the gem5 instance, all of the thread contexts are set Halted. The status of thread context will not be active until the Workload initializes start up, except the StubWorkload. So if the user uses the StubWorkload, and the CPU is connected with the model_reset port. The thread context of the CPU will be activated possibly. The following is the steps of activating thread context of the CPU without Workload[1] initialization or lower model_reset port[2]. 1. Raise the model_reset port (Change the state from Halted to Suspend) `5641c5e464/src/cpu/base.cc (L671-L673)` 2. Post the interrupt to CPU (Change the state from Suspend to Active) `5641c5e464/src/cpu/base.cc (L231-L239)` Implementation of wakeup SimpleCPU: `5641c5e464/src/cpu/simple/base.cc (L251-L259)` MinorCPU: `5641c5e464/src/cpu/minor/cpu.cc (L143-L151)` O3CPU: `5641c5e464/src/cpu/o3/cpu.cc (L1337-L1346)` This CL fixed the issue when raising the model reset port to CPU(let CPU sleep) if the CPU is not activated by workload. If the CPU status is halted, it's should not change to Suspend to avoid wake up Reference The model_reset is introduced in the CL: https://gem5-review.googlesource.com/c/public/gem5/+/67574/4 [1] Activate by workload (ARM example): `5641c5e464/src/arch/arm/fs_workload.cc (L101-L114)` [2] Lower the model_reset: `5641c5e464/src/cpu/base.cc (L191-L192)` `5641c5e464/src/cpu/base.cc (L674-L685)` Change-Id: I5bfc0b7491d14369fff77b98b71c0ac763fb7c42	2024-05-16 10:02:53 -07:00
OdnetninI (Eduardo José Gómez Hernández)	17cbbd84ae	cpu: Indirect predictor track conditional indirect (#1077 ) As discussed in https://github.com/orgs/gem5/discussions/954: In the refactor made by commit `f65df9b959` conditional indirect branches are no longer updated in the indirect predictor. This kind of branches do not exist in x86 neither arm, but they are present in PowerPC. This patch, enables the indirect predictor to track this kind of branches.	2024-04-29 11:38:22 +01:00
Nicholas Mosier	c679c9c127	cpu-o3: prioritize exiting threads when committing (#1056 ) Fix #1055. Prioritize committing from exiting threads before we consider other threads using the specified SMT commit policy. All instructions in the ROB for exiting threads should already have been squashed. Thus, this ensures that the ROB instruction queues for all exiting threads will be empty at the end of the current cycle, avoiding the assertion failure encountered in #1055. Change-Id: Ib0178a1aa6e94bce2b6c49dd87750e82776639dc	2024-04-25 11:15:14 -07:00
Nicholas Mosier	51d546cb06	cpu-o3: Clear current macro-op in fetch if squashing after last micro-op (#1047 ) Fix #1042. Clear the current fetch macro-op if the instruction initiating the squash is the last micro-op in its macro-op. Change-Id: I77f60334771277e47f19573d4067b3a7bc5488b2	2024-04-25 11:14:58 -07:00
Nicholas Mosier	cf5ec880c9	cpu-kvm: Support overflows when migrating across hybrid cores Add support for event overflows when the host thread migrates across differnt types of cores on a hybrid host architecture. This patch achieves this by simply halving the sample period for each performance counter. Since there are two types of cores, this guarantees that an overflow event will trigger before N events occur, where N is the requested period (e.g., number of instructions to simulate). This may result in many early triggers (up to log2(N)) before the requested period is reached. However, gem5's existing bookkeeping logic already handles this case properly: if fewer events than requested occurred, it will set a new period (N - observed) and resume execution. This loop will exit once N events have actually occurred. Change-Id: Iff85237da1ae1aa25bc2045fbf9091726291fe36	2024-04-24 09:47:46 -07:00
Nicholas Mosier	30ea15009f	cpu-kvm: Support perf counters on hybrid host architectures Fix #1064 by adding support for hardware performance counters on hybrid architectures like Intel Alder Lake. Hybrid architectures have multiple types of cores, each of which require the instantiation of a separate performance counter. The KVM CPU's PerfKvmCounter class was not aware of this, any only instantiated a single performance counter, implicitly bound to the P-core only. This meant that if gem5 ever ran on an E-core, the various hardware performance counters would not get updated properly, in some cases always zero (e.g., for the number of instructions executed). This patch adds support for hybrid host architectures as follows. First, we convert PerfKvmCounter into an abstract class, which has two concrete implementations: SimplePerfKvmCounter and HybridPerfKvmCounter. The former is used for non-hybrid architectures or for non-hardware performance counters and is functionally equivalent to the prior implementation of PerfKvmCounter. The latter is used for instantiating hardware performance counters (i.e., of type PERF_TYPE_HARDWARE) on hybrid host architectures. It does so by internally instantiating two SimplePerfKvmCounters, one for a P-core and one for an E-core. Upon read, it sums the results of reading the two internal counters. Change-Id: If64fcb0e2fcc1b3a6a37d77455c2b21e1fc81150	2024-04-24 09:47:46 -07:00
Jason Lowe-Power	c13aa7727d	cpu: Fix Ruby/x86 pio port connections (#1035 ) Fixes #1033 In the BaseCPU object _uncached_interrupt_response_ports is a class variable, not an instance variable. #1004 changed the explicit self._uncached_interrupt_response_ports to use extend. This caused the list of ports to be extended for all cores, which caused problems when using a system with more than 1 core. This reverts the `extend` part of the change, but keeps the rest. Change-Id: I6dc7d6da6763048d82960229d34933a3a2ac36e0 Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-04-17 08:20:04 -07:00
Yu-Cheng Chang	ebb70dea99	cpu: Fix KVM false negative warning after Kconfig transition (#1013 ) When we start to build gem5. We will read and process all of SConsopts files, and process the after_sconsopts_callbacks after all of SConsopts files read. In the KVM_ISA env setting, the KVM_ISA env can be set in the different files, take x86 and arm as example: KVM_ISA default value: `bc39283451/src/cpu/kvm/SConsopts` x86 KVM_ISA: `bc39283451/src/arch/x86/kvm/SConsopts (L39-L45)` arm KVM_ISA: `bc39283451/src/arch/arm/kvm/SConsopts (L35-L36)` We should move the kvm warning after all of SConsopts env read issue: https://github.com/gem5/gem5/issues/686 Change-Id: I096c6bebaaec18f9b2af93191d0dd23c65084eda	2024-04-12 09:23:56 -07:00
Nicholas Mosier	bc39283451	cpu-o3, arch-x86: initialize interrupts for all SMT threads (#1007 ) Fix issue #1004. When enabling SMT with the O3 cpu, only the first interrupts object was getting initialized properly. This patch initializes all interrupts objects, one per SMT thread. Change-Id: I300782b645bd8ea3ef2497278fb73125ab4bf495	2024-04-11 11:17:24 -07:00
Ivan Fernandez	c91d1253de	cpu: This commit updates cpu FUs according to new Simd types This commit updates cpu by removing VectorXXX types and updates FUs according to the newer SimdXXX ones. This is part of the homogenization of RISCV Vector instruction types, which moved from VectorXXX to SimdXXX. Change-Id: I84baccd099b73a11cf26dd714487a9f272671d3d	2024-03-25 19:01:47 +01:00
Ivan Fernandez	1e743fd85a	arch-riscv: adding vector unit-stride segment stores to RISC-V (#913 ) This commit adds support for vector unit-stride segment store operations for RISC-V (vssegXeXX). This implementation is based in two types of microops: - VsSegIntrlv microops that properly interleave source registers into structs. - VsSeg microops that store data in memory as contiguous structs of several fields. Change-Id: Id80dd4e781743a60eb76c18b6a28061f8e9f723d Gem5 issue: https://github.com/gem5/gem5/issues/382	2024-03-22 15:45:58 -07:00
Ivan Fernandez	f6c61836b3	arch-riscv: adding vector unit-stride segment loads to RISC-V (#851 ) This commit adds support for vector unit-stride segment load operations for RISC-V (vlseg<NF>e<X>). This implementation is based in two types of microops: - VlSeg microops that load data as it is organized in memory in structs of several fields. - VectorDeIntrlv microops that properly deinterleave structs into destination registers. Gem5 issue: https://github.com/gem5/gem5/issues/382	2024-03-06 11:27:06 -08:00
Giacomo Travaglini	8759131df3	cpu-o3, arch: Fix SMT bug arising from v23.0 and make gem5 more robust with SMT (#828 ) This PR is fixing https://github.com/gem5/gem5/issues/668. It fixes it for all ISAs other than Arm with the first commit, which is setting the number of architectural Matrix registers to 0 for those ISA which are not using them. It then partly fixes it for Arm as well with the 2nd commit: by removing RenameMap::numFreeEntries we don't stall renaming unless a matrix instruction is encountered... This means most binaries will run with SMT as long as they don't use FEAT_SME instructions. Please note: this is not simply a SMT fix, it will generally address a shortcoming in the way we were renaming instructions. If an Arm binary wants to use SMT with FEAT_SME, the 4th commit will make sure the lack of physical registers is notified explicitly at the beginning of simulation, rather than silently blocking renaming	2024-02-19 08:52:31 +00:00
Arnabjyoti Kalita	b826d96f40	cpu-o3: add PerThreadUnifiedThreadMap to O3 CPU (#842 ) Github issue: https://github.com/gem5/gem5/issues/373 Change-Id: I1c8aba9bc5ea4e45faa6c174780904b8bd618604	2024-02-12 09:26:31 -08:00
Giacomo Travaglini	4eb0cd44fc	cpu-o3: Restrict constraint on number of physical registers Having the number of physical registers matching exactly the number of architectural ones does not guarantee a proper execution as it means the freeList would have 0 registers available for renaming. In this case the worst would happen: renaming would silently stall execution indefinitely. With this change we report the issue to the user and fail execution Change-Id: I1eb968802f1a1a5115012f44b541542a682f887d Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-02-02 21:18:32 +00:00
Giacomo Travaglini	1fb7c1ad7e	cpu-o3: Rename numFreeEntries into minFreeEntries Change-Id: I89faeb001ebdcbc90ea88508f8d231ec6e7fe197 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-02-02 18:11:47 +00:00
Giacomo Travaglini	86158de220	cpu-o3: Stop using RenameMap::numFreeEntries The method is extracting the minimum number of [1] non-zero free registers/entries across all register classes. This means that if we have saturated all register storage for a particular class, renaming will stop as a whole. I believe it does make sense to keep renaming and only block renaming in case an instruction requiring the particular register type is encountered. This would happen with the Rename::renameInsts method [1]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename_map.hh#L269 [2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L662 Change-Id: I932826a77a5c0b2e05d8fdcab0e6ca13cf0e3d23 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-02-02 18:11:47 +00:00
Mahyar Samani	b79fe82e5c	cpu,stdlib: Updating strided generator (#762 ) This change improves the functionality of strided generator to create trace with better flexibility. It allows the user to manually set offset and stride size instead of calculating it based on a "gen_id". This way different patterns could be created with the same SimObject. In addition, this change adds stdlib components for strided generator.	2024-02-01 09:08:42 -08:00
Matthew Poremba	63caa780c2	misc: Remove all references to GCN3 Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename FIJI to Vega and Carrizo to Raven. Using misc since there is not enough room to fit all the tags. Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891	2024-01-17 11:11:06 -06:00
Bobby R. Bruce	213d0b0bfe	cpu: 'suppressFuncErrors' -> 'pkt->suppressFuncError()' fix Change-Id: If4aa71e9f6332df2a3daa51b69eaad97f6603f6b	2023-12-20 09:15:15 -08:00
Hoa Nguyen	7a5052b3a0	arch-arm: Only build ArmCapstoneDisassembler when ISA is arm (#553 ) Currently, if the Capstone header file is found in the host system, scons will try to build the ArmCapstoneDisassembler regardless of the gem5 target ISA. This is causing problem when the host has Capstone, but the gem5 target ISA is not arm. Compiling gem5 in this case will cause errors, e.g., ArmISA and ArmSystem is not found. This change aims to prevent building the ArmCapstoneDisassembler when the gem5 target ISA is not arm. Ref: [1] The Arm Capstone PR https://github.com/gem5/gem5/pull/494 Change-Id: I1e714d34aec8fe2a2af8cd351536951053a4d8a5	2023-12-03 13:22:11 -08:00
Richard Cooper	2fbbdad618	base: Add encapsulation to the loader::Symbol class This commit converts `gem5::loader::Symbol` to a full class with private members, enforcing encapsulation. Until now client code has been able to (and does) access members directly. This change will enable class invariants to be enforced via accessor methods. Change-Id: Ia0b5b080d4f656637a211808e13dce1ddca74541 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2023-12-01 22:00:26 +00:00
Andreas Sandberg	dcdebec0f6	misc,python: Add `isort` hook to pre-commit (#431 )	2023-11-30 09:54:12 +00:00
Bobby R. Bruce	d11c40dcac	misc: Run `pre-commit run --all-files` This ensures `isort` is applied to all files in the repo. Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9	2023-11-29 22:06:41 -08:00
Adrià Armejach	eb13b32314	cpu-o3: Fix discarded requests str-ld forwarding (#614 ) With the use of large RVV vectors (i.e., 8K or 16K bits) and a limited number of cacheLoadPorts, some loads take multiple cycles to execute. This triggered certain conditions when store-to-load forwarding happens in the middle of the execution of a load that already has outstanding packets. First, after store-to-load forwarding the request is marked as discarded and the load is immediately writtenback, which triggers a writebackDone that tries to delete the request, triggering an assert as it still has outstanding packets. This patch avoid deleting the request leaving it self owned, it will be deleted when the last packet arrives in packetReplied. Second, this patch avoid checking snoops on discarded requests by checking if the request exists. Change-Id: Icea0add0327929d3a6af7e6dd0af9945cb0d0970 Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>	2023-11-29 08:45:03 -08:00

1 2 3 4 5 ...

2637 Commits