derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Alexander Richardson	aa2fade12e	Drop unrelated change	2024-05-01 18:00:09 +01:00
Alex Richardson	bb4c13143c	arch-generic: Fix reading from special :semihosting-features file The implementation of SYS_FLEN was missing, which caused picolibc to treat this file as not implemented. Additionally, there was a bug in the SYS_READ call that was comparing the wrong variable against the passed buffer length. It was comparing the current file position against the buffer length instead of the number of written bytes. Finally, pos was unititialized which could result in spurious errors. Change-Id: I8b487a79df5970a5001d3fef08d5579bb4aa0dd0	2024-04-30 16:28:06 -07:00
Yangyu Chen	666d1dd9a2	arch-riscv: Add Integer Conditional operations extension (Zicond) instructions (#1078 ) This PR added RISC-V Integer Conditional Operations Extension, which is in the RVA23U64 Profile Mandatory Base. And the performance of conditional move instructions in micro-architecture is an interesting point to explore. Zicond instructions added: czero.eqz, czero.nez Changes based on spec: https://github.com/riscvarchive/riscv-zicond/releases/download/v1.0.1/riscv-zicond_1.0.1.pdf	2024-04-30 05:44:45 -07:00
OdnetninI (Eduardo José Gómez Hernández)	17cbbd84ae	cpu: Indirect predictor track conditional indirect (#1077 ) As discussed in https://github.com/orgs/gem5/discussions/954: In the refactor made by commit `f65df9b959` conditional indirect branches are no longer updated in the indirect predictor. This kind of branches do not exist in x86 neither arm, but they are present in PowerPC. This patch, enables the indirect predictor to track this kind of branches.	2024-04-29 11:38:22 +01:00
Alexander Richardson	1bb5d3b99e	arch-riscv: Add support for RISC-V semihosting (#681 ) See https://github.com/riscv-software-src/riscv-semihosting for the current specification. Almost all code is shared with the Arm implementation. Tested by running some binaries built with [picolibc](https://github.com/picolibc/picolibc).	2024-04-27 05:12:32 -07:00
Ivana Mitrovic	939d8e28df	mem-cache: Fix TreePLRU num leaves error (#1075 ) This PR fixes the error noted here #1073. Change-Id: I5d31c259ac5ee93f46f28b20eda4f58460ba8523	2024-04-26 20:22:20 -07:00
Robert Hauser	1b323a9571	systemc: remove if clause in Gem5ToTlmBridgeBase (#1059 ) In the payload event queue in Gem5ToTlmBridgeBase, the phase is checked twice for BEGIN_RESP. This commit removes the second if clause since it is unnecessary. Duplicate if clause in line 234 & line 256 `dd2689905f/src/systemc/tlm_bridge/gem5_to_tlm.cc (L234-L267)` please correct me if I am missing something important	2024-04-25 11:15:30 -07:00
Nicholas Mosier	c679c9c127	cpu-o3: prioritize exiting threads when committing (#1056 ) Fix #1055. Prioritize committing from exiting threads before we consider other threads using the specified SMT commit policy. All instructions in the ROB for exiting threads should already have been squashed. Thus, this ensures that the ROB instruction queues for all exiting threads will be empty at the end of the current cycle, avoiding the assertion failure encountered in #1055. Change-Id: Ib0178a1aa6e94bce2b6c49dd87750e82776639dc	2024-04-25 11:15:14 -07:00
Nicholas Mosier	51d546cb06	cpu-o3: Clear current macro-op in fetch if squashing after last micro-op (#1047 ) Fix #1042. Clear the current fetch macro-op if the instruction initiating the squash is the last micro-op in its macro-op. Change-Id: I77f60334771277e47f19573d4067b3a7bc5488b2	2024-04-25 11:14:58 -07:00
Nicholas Mosier	66decb2e93	mem-ruby: Fix functional reads for MESI Three-Level messages (#1045 ) Fix #1044. This patch adds checks for message types (PUTX_COPY, DATA, DATA_EXCLUSIVE) that contain data blocks but were missing from the original `functionalRead` method in MESI Three-Level messages. Change-Id: I0cedc314166c9cc037bf20f5b7fef5552dd1253c	2024-04-25 11:14:37 -07:00
Giacomo Travaglini	83e55743e1	arch-arm: Add misc_accessor templated functions to read/write regs at different ELs (#1072 ) A usual system register read/write pattern is something like the following ``` switch(el) { case EL1: tc->readMiscReg(REG_EL1); case EL2: tc->readMiscReg(REG_EL2); case EL3: tc->readMiscReg(REG_EL3); } ``` To avoid repeating these switch statements all over gem5, we define templated functions which have an accessor struct as a template parameter. These accessor will help populating the templated switch construct. We provide the FAR register accessor as an example. The accessor should define the following fields: (type, el0, el1, el2, el3) Example: ``` struct FarAccessor { using type = RegVal; static const MiscRegIndex el0 = NUM_MISCREGS; static const MiscRegIndex el1 = MISCREG_FAR_EL1; static const MiscRegIndex el2 = MISCREG_FAR_EL2; static const MiscRegIndex el3 = MISCREG_FAR_EL3; }; ```	2024-04-25 14:57:10 +01:00
Andreas Sandberg	85d21b5718	cpu-kvm: Support perf counters on hybrid host architectures (#1065 ) Fix #1064 by adding support for hardware performance counters on hybrid architectures like Intel Alder Lake. Hybrid architectures have multiple types of cores, each of which require the instantiation of a separate performance counter. The KVM CPU's PerfKvmCounter class was not aware of this, any only instantiated a single performance counter, implicitly bound to the P-core only. This meant that if gem5 ever ran on an E-core, the various hardware performance counters would not get updated properly, in some cases always zero (e.g., for the number of instructions executed). This patch adds support for hybrid host architectures as follows. First, we convert PerfKvmCounter into an abstract class, which has two concrete implementations: SimplePerfKvmCounter and HybridPerfKvmCounter. The former is used for non-hybrid architectures or for non-hardware performance counters and is functionally equivalent to the prior implementation of PerfKvmCounter. The latter is used for instantiating hardware performance counters (i.e., of type PERF_TYPE_HARDWARE) on hybrid host architectures. It does so by internally instantiating two SimplePerfKvmCounters, one for a P-core and one for an E-core. Upon read, it sums the results of reading the two internal counters. Change-Id: If64fcb0e2fcc1b3a6a37d77455c2b21e1fc81150	2024-04-25 10:45:47 +01:00
Giacomo Travaglini	a3d030d161	arch-arm: Add the FAR_EL* register accessor Use it accordingly in the faulting/exception logic Change-Id: I2f6360d04698b6fb7188e776f1d6966e99ce19b1 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-25 09:45:54 +01:00
Giacomo Travaglini	19628e746d	arch-arm: Add readRegister/writeRegister templates This is adding two templated functions for reading/writing system registers (MiscRegs). It is introducing them inside a new misc_regs namespace. Change-Id: I21233337c057673d46d1147971ebabbfc2c2bb6a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-25 09:45:00 +01:00
Ivana Mitrovic	cc3655cdad	arch-arm: Refactor PTW (#1060 ) This PR is refactoring the Arm PageTableWalker in the following way: 1) Simplifying the currState handling logic (mainly the tear down) 2) Amending the TlbTestInterface APIs to use a RequestPtr reference 3) Use finalizePhysical even when MMU is off, which means allowing memory mapped m5ops to work also in that circumstance	2024-04-24 21:00:42 -07:00
Nicholas Mosier	ed8a09303a	mem-cache: Remove power-of-2 requirement for TreePLRU num leaves (#1061 ) Remove the requirement in TreePLRU's implementation that the number of leaves (i.e., the number of cache ways) be a power of two. Firstly, on some recent processors, this is not the case---for example, Intel Golden Cove's L1D has 12 ways. Secondly, The implementation of TreePLRU appears to work just fine as-is with a way count that's not a power of two. Change-Id: If2a27dc5bbe7a8e96684f79ce791df5c0b582230	2024-04-24 20:59:06 -07:00
Giacomo Travaglini	bf78579fa5	arch-arm: Change the TlbTestInterface to accept a RequestPtr Now that the Request has been made an Extensible object, it can carry within itself much more data. It makes sense to pass it to the TlbTestInterface as more information about the table walk can be extracted from it. This is also aligning with the testTranslation utility which is expecting a request reference as first argument. Change-Id: I3dbc9a81d6b4bcc1801246ba7eb4136774d8f3c7 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-24 18:12:36 +01:00
Giacomo Travaglini	89323c5112	arch-arm: Group testTranslation and finalizeTranslation together They both make final checks to the VA->PA translation before relinquishing control back to the translate client (usually CPU code) Change-Id: Ib0a9da25404248c22c6a240817d2f50f0913fdf7 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-24 18:12:36 +01:00
Giacomo Travaglini	0c20eb3ec7	arch-arm: Call finalizePhysical even when MMU is off The finalizePhysical is just checking if the physical address falls within the m5op region (if using mmapped m5ops). There's not reason why we shouldn't enable it with virtual memory off Change-Id: I5ab80fd4e7886743abd4b7d85937b72253b578d3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-24 18:12:36 +01:00
Giacomo Travaglini	a299d2db0c	arch-arm: Move testWalk check within the fetchDescriptor We also unify the fault handling logic; rather than cleaning up the WalkerState in several places scattered throughout the walking code, we handle faults in the top level method Change-Id: Ia22fb6f27044ff445fffbab228777a48efa473cb Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-24 18:12:36 +01:00
Giacomo Travaglini	6d0cb6eaa3	arch-arm: Pull out Request generation from the TableWalker::Port Change-Id: Ie8c309bb79b4ce7c656428660c9e2effd58a89f0 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-24 18:12:36 +01:00
Giacomo Travaglini	e450cfef16	arch-arm: Move testWalk functionality to the TableWalker class It's more efficient to pass a reference of the tester to the TableWalkers. In this way a table walk check is tested directly from the walkers instead of going through the MMU every time. Change-Id: I9820dbabb8b551981005a65efa54a76b1a027541 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-24 18:12:36 +01:00
Giacomo Travaglini	bbe5bf2644	arch-arm: Simplify TableWalker::walk method Change-Id: Ib823b3b577a70f6ec14de854cb9c250faa04e932 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com>	2024-04-24 18:12:36 +01:00
Giacomo Travaglini	9d9b7848bb	arch-arm: Properly compute EL even in stage2 walks This is done in order to differentiate between EL0 (unprivileged) and EL1. Effectively it won't change much as most of the decisions are now taken according to the translation regime which will be the same regardless (EL10) Change-Id: I218037e9c19cf638aff05c51869e439204d9af69 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2024-04-24 18:12:36 +01:00
Nicholas Mosier	cf5ec880c9	cpu-kvm: Support overflows when migrating across hybrid cores Add support for event overflows when the host thread migrates across differnt types of cores on a hybrid host architecture. This patch achieves this by simply halving the sample period for each performance counter. Since there are two types of cores, this guarantees that an overflow event will trigger before N events occur, where N is the requested period (e.g., number of instructions to simulate). This may result in many early triggers (up to log2(N)) before the requested period is reached. However, gem5's existing bookkeeping logic already handles this case properly: if fewer events than requested occurred, it will set a new period (N - observed) and resume execution. This loop will exit once N events have actually occurred. Change-Id: Iff85237da1ae1aa25bc2045fbf9091726291fe36	2024-04-24 09:47:46 -07:00
Nicholas Mosier	30ea15009f	cpu-kvm: Support perf counters on hybrid host architectures Fix #1064 by adding support for hardware performance counters on hybrid architectures like Intel Alder Lake. Hybrid architectures have multiple types of cores, each of which require the instantiation of a separate performance counter. The KVM CPU's PerfKvmCounter class was not aware of this, any only instantiated a single performance counter, implicitly bound to the P-core only. This meant that if gem5 ever ran on an E-core, the various hardware performance counters would not get updated properly, in some cases always zero (e.g., for the number of instructions executed). This patch adds support for hybrid host architectures as follows. First, we convert PerfKvmCounter into an abstract class, which has two concrete implementations: SimplePerfKvmCounter and HybridPerfKvmCounter. The former is used for non-hybrid architectures or for non-hardware performance counters and is functionally equivalent to the prior implementation of PerfKvmCounter. The latter is used for instantiating hardware performance counters (i.e., of type PERF_TYPE_HARDWARE) on hybrid host architectures. It does so by internally instantiating two SimplePerfKvmCounters, one for a P-core and one for an E-core. Upon read, it sums the results of reading the two internal counters. Change-Id: If64fcb0e2fcc1b3a6a37d77455c2b21e1fc81150	2024-04-24 09:47:46 -07:00
Harshil Patel	d548f2c5c4	tests: fix tests that use JSON client - There was a bug in JSONClient when searching for resoruces. The id was not checked and the booleans were not set to true when optional search queries like resource_version and gem5_version are not passed. Change-Id: I4aa7c5388035144ec6864d57130ad09e6709692e	2024-04-23 16:24:09 -07:00
Harshil Patel	97a0530452	stdlib: Enable bundled resource requests from the databases (#779 )	2024-04-22 11:53:23 -07:00
Bobby R. Bruce	13f85b989f	stdlib: Fix obtaining of Simpoint Resources Change-Id: Ic73547c8c4acbe5d8a30a24dd8709cb2e9f6eb5e	2024-04-19 01:54:42 -07:00
Ivana Mitrovic	42ffa52907	mem-ruby: Implement no_alloc Far Atomics in CHI (#994 ) This PR introduces a missing pice of far atomic implementation. This pull request incorporates several changes: - Enable 2-level and 4-level (and N-level) cache hierarchies, removing Atomic_NoWait transactions - Fix Unique Near policy implementation that raised abort - Add support for alloc_on_atomic == False. Enables Far Atomics on systems where the HNF does not allocate evicted lines at LLC (Like in WriteUpdate).	2024-04-18 11:35:47 -07:00
Ivana Mitrovic	c44b8635ab	arch-x86: Movfp account for dataSize=4 (#1024 ) Movfp instruction did not account for only copying the lower half of src register if dataSize is 4. GitHub Issue: #893 I used the test code in issue #893 to verify the fix is working.	2024-04-18 10:36:00 -07:00
Bartek Gąsiorzewski	84cba2a8a8	dev: Fix interrupt logic in uart8250 (#1009 ) Hi, we've noticed some issues with the Uart8250 device when using it as the Linux console. Sometimes the Uart interrupt would remain constantly posted, so Linux would continue to try and handle it, effectively resulting in an infinite loop. With this patch, I'm no longer seeing any issues, but my testing has been limited to configurations and workloads we're interested in at Imagination, so please let me know if there's some other tests I should run or if you notice any other issues. This patch fixes several issues with interrupt posting and clearing in the uart8250 device. The "status" member variable and the console interrupt should be kept in sync. However, in one code path in readIir, the interrupt bit was being cleared in the status variable but not in the platform controller. Additionally, in some code paths, the interrupts would be cleared in the status variable and in the interrupt controller, but a future interrupt would remain scheduled, causing a spurious interrupt and setting a bit in status to 1. These issues can confuse the kernel and result in an ininite interrupt handling loop. Another issue is related to the fact that there are two interrupt causes (TX and RX) and both of them can be valid at the same time. When one of them becomes no longer valid, we should check the status of the other one before clearing the interrupt. This patch addresses the issues listed above and refactors the interrupt clearing logic to reduce repetition.	2024-04-17 11:27:39 -07:00
Jason Lowe-Power	c13aa7727d	cpu: Fix Ruby/x86 pio port connections (#1035 ) Fixes #1033 In the BaseCPU object _uncached_interrupt_response_ports is a class variable, not an instance variable. #1004 changed the explicit self._uncached_interrupt_response_ports to use extend. This caused the list of ports to be extended for all cores, which caused problems when using a system with more than 1 core. This reverts the `extend` part of the change, but keeps the rest. Change-Id: I6dc7d6da6763048d82960229d34933a3a2ac36e0 Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-04-17 08:20:04 -07:00
Lukas Zenick	01a5edc86e	arch-x86: Use mbits function for clarity Change-Id: I577ee55752f917e561e4c741ba7a19f0229318b5	2024-04-15 22:49:41 -05:00
Matthew Poremba	a03319bef7	arch-vega: Fix output warnings, gem5.fast (#1023 ) Fix gem5.fast build not building when using gpu model. Removes very spammy stat distribution bucket size prints when running gpu model.	2024-04-15 13:18:27 -07:00
Matthew Poremba	7e2d8dee42	mem,gpu-compute: Implement GPU TCC directed invalidate (#1011 ) The GPU device currently supports large BAR which means that the driver can write directly to GPU memory over the PCI bus without using SDMA or PM4 packets. The gem5 PCI interface only provides an atomic interface for BAR reads/writes, which means the values cannot go through timing mode Ruby caches. This causes bugs as the TCC cache is allowed to keep clean data between kernels for performance reasons. If there is a BAR write directly to memory bypassing the cache, the value in the cache is stale and must be invalidated. In this commit a TCC invalidate is generated for all writes over PCI that go directly to GPU memory. This will also invalidate TCP along the way if necessary. This currently relies on the driver synchonization which only allows BAR writes in between kernels. Therefore, the cache should only be in I or V state. To handle a race condition between invalidates and launching the next kernel, the invalidates return a response and the GPU command processor will wait for all TCC invalidates to be complete before launching the next kernel. This fixes issues with stale data in nanoGPT and possibly PENNANT.	2024-04-15 13:18:01 -07:00
Giacomo Travaglini	bdcffdd0f0	dev-arm: Do not mark the MpamMSC as abstract (#1030 ) This prevents its instantiation Change-Id: I775a64904a01cf36e4cc1e0cd45765f03325c5ca Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-04-15 09:40:22 -07:00
Lukas Zenick	d67a7797d2	arch-x86: Movfp account for dataSize=4 Change-Id: I97e7a6f2738a57cad9907ddfe5c8030a26c147e8	2024-04-14 15:59:24 -05:00
Matthew Poremba	3db6e86fea	arch-vega: Fix string check warnings on fast build gem5.fast does not currently build if the GPU model is built. This fixes the array-bounds warnings allowing gem5.fast to build again. Change-Id: I463c2847c3ecfd2257a70418fa247090b0493f9b	2024-04-14 12:22:57 -07:00
Matthew Poremba	01f2df4b8a	gpu-compute: Fix stat bucket sizes Change-Id: If30505515867a866c631cb117d3d22e19814a2f2	2024-04-13 15:51:41 -07:00
Yu-Cheng Chang	ebb70dea99	cpu: Fix KVM false negative warning after Kconfig transition (#1013 ) When we start to build gem5. We will read and process all of SConsopts files, and process the after_sconsopts_callbacks after all of SConsopts files read. In the KVM_ISA env setting, the KVM_ISA env can be set in the different files, take x86 and arm as example: KVM_ISA default value: `bc39283451/src/cpu/kvm/SConsopts` x86 KVM_ISA: `bc39283451/src/arch/x86/kvm/SConsopts (L39-L45)` arm KVM_ISA: `bc39283451/src/arch/arm/kvm/SConsopts (L35-L36)` We should move the kvm warning after all of SConsopts env read issue: https://github.com/gem5/gem5/issues/686 Change-Id: I096c6bebaaec18f9b2af93191d0dd23c65084eda	2024-04-12 09:23:56 -07:00
Nicholas Mosier	bc39283451	cpu-o3, arch-x86: initialize interrupts for all SMT threads (#1007 ) Fix issue #1004. When enabling SMT with the O3 cpu, only the first interrupts object was getting initialized properly. This patch initializes all interrupts objects, one per SMT thread. Change-Id: I300782b645bd8ea3ef2497278fb73125ab4bf495	2024-04-11 11:17:24 -07:00
Ivana Mitrovic	db1c336237	cpu,arch-arm,arch-riscv: adding new instruction types to RISC-V (#589 ) This commit adds more detailed instruction types for RISC-V Vector. Concretely, it substitutes VectorIntegerArith, VectorFloatArith, VectorIntegerReduce and VectorFloatReduce with more specific types related to the operation that each instruction (e.g., VectorIntegerAdd or VectorIntegerMult). Additionaly, fixes two RISC-V instruction types (VectorXXX) that were used in ARM SVE, placing the proper SimdXXX ones. Change-Id: I31774fa6a7cd249abfffec68d11d3d77f08ad70b CC @adriaarmejach	2024-04-11 10:15:56 -07:00
Giacomo Travaglini	3b5ae7b4d1	Add a generic cache template library (#745 ) Add a generic cache template to construct internal storage structures. Also add some example use cases by converting the prefetcher tables to use this new library.	2024-04-11 08:00:34 +01:00
Pranith Kumar	769f750eb9	mem-cache: Implement AssociativeSet from AssociativeCache AssociativeSet can reuse most of the generic cache library code with the addition of a secure bit. This reduces duplicated code. Change-Id: I008ef79b0dd5f95418a3fb79396aeb0a6c601784	2024-04-10 16:17:57 -04:00
Pranith Kumar	f3bc10c168	mem-cache: Derive tagged entry from cache entry The tagged entry can be derived from the generic cache entry and add the secure flag that it needs. This reduces code duplication. Change-Id: I7ff0bddc40604a8a789036a6300eabda40339a0f	2024-04-10 16:17:57 -04:00
Pranith Kumar	8fb3611614	mem-cache: prefetch: Implement DCPT tables using cache library The DCPT table is better built using the generic cache library since we do not need the secure bit. Change-Id: I8a4a8d3dab7fbc3bbc816107492978ac7f3f5934	2024-04-10 16:17:57 -04:00
Pranith Kumar	2c7d4bed66	mem-cache: Implement VFT tables using cache library The frequency table is better built using the generic cache library instead of the AssociativeSet since the secure bit is not needed for this structure. Change-Id: Ie3b6442235daec7b350c608ad1380bed58f5ccf4	2024-04-10 16:17:57 -04:00
Pranith Kumar	2cc2ad5097	misc: Add a generic cache library Add a generic cache library modeled after AssociativeSet that can be used for constructing internal caching structures. Change-Id: I1767309ed01f52672b32810636a09142ff23242f	2024-04-10 16:17:57 -04:00
Giacomo Travaglini	4b98551aaf	Update src/python/gem5/components/cachehierarchies/abstract_cache_hierarchy.py Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>	2024-04-10 16:17:56 -04:00

1 2 3 4 5 ...

15070 Commits