derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Yangyu Chen	b0d81ec8a2	arch-riscv: fix GDB breakpoint issue for RV32 (#1470 ) Since PR #1316, we use sign-extend for all address generation, including PC, to match the ISA specification for modifiable XLEN. However, when we set a breakpoint using remote GDB, our address is not sign-extended. This causes the breakpoint to be set at the wrong address, as specified in Issue #1463. This PR fixes the issue by sign-extending the address when setting a breakpoint. This also matches the RISC-V ISA Specification that "must sign-extend results to fill the entire widest supported XLEN in the destination register." Change-Id: I9b493bf8ad5b1ef45a9728bb40fc5e38250fe9c3 Signed-off-by: Yangyu Chen <cyy@cyyself.name>	2024-08-19 10:25:39 -07:00
Yu-Cheng Chang	aa4fe362a5	arch-riscv: Sign-extend the address in newPCState (#1471 ) From #1316, creating the new PCState should sign-extend the address to avoid wrong address issue. Change-Id: I884b4e3708f5f1cc49cfd44d51bec5a2b63cc47a	2024-08-19 08:21:42 -07:00
Giacomo Travaglini	280871245b	arch-arm: Redirect VHE for ZCR_EL1 (#1472 ) Change-Id: Iff83d25257065503dc02728461823bc9985dbab3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-08-16 22:49:49 +01:00
Alexander Richardson	646f994efb	arch-arm: Fix incorrect operation of VRINT* instructions (#1325 ) After a lot of debugging and comparing traces I noticed that vrintp was giving different results from QEMU. An input of 0x3f800000 (1.0) was being passed to the fplib helpers as (uint32_t)1 which has a completely different floating-point interpretation and the result was therefore completely wrong. I've fixed this as well as all remaining implicit float-to-int conversions in the ARM instruction execution. There are more -W(implicit-)float-conversion warnings in the other executors, but for now this fixes the issue I was seeing. Change-Id: Ifdeee745ca155d7f4504ac4c54235ac431acdeb9	2024-08-15 11:01:48 +01:00
Setu	629bf84e10	mem: Stride Prefetcher Fix (#1449 ) This PR fixes the issues mentioned in #1448. Note that this contribution is the result of a joint collaboration with @AbhishekUoR This PR introduces the following 4 changes: 1. It changes the addresses which are used to compute the stride to cache line aligned addresses (the current version uses word aligned addresses) 2. It correctly returns if the stride does not match (as opposed to issuing prefetches using the new stride incorrectly) 3. It returns if the new stride is 0, indicating multiple reads from the same cache line. 4. It removes code which is no longer necessary after the addition of changes number 1 and 3. Change-Id: Ic346d0e15df6d07e2b93289c8d6b89b4c2f45a34 --------- Co-authored-by: Abhishek Shailendra Singh <abs218@leigh.edu>	2024-08-14 07:16:10 -07:00
Alexander Richardson	f6f547fb62	arch-arm: Fix incorrect behaviour of VFNMS and VFNMA (#1420 ) This was found while comparing a diverging execution against QEMU traces and checking for the first mismatched program counter. Fortunately this was caused by a branch shortly after this incorrect computation but still took a long time to track down. There are two issues here: the decoder had inverted the cases for S and A, and the sign bit was wrong for VFN*.	2024-08-13 09:05:52 +01:00
Matthew Poremba	c359b53a19	arch-vega: Update microscaling format scaling and denorm handling (#1451 ) This PR has 3 commits: - Update scaling methods to scale by multiplication or division when upcasting or downcasting respectively. - Preserve the sign when a microscaling conversion results in NaN or infinity to match hardware. - Rework rounding to handle cases where conversion results in a denormal number in the output type so that the value is correct.	2024-08-12 07:00:26 -07:00
Matthew Poremba	7d46c50663	arch-vega: Swizzle multi-dword scratch requests (#1445 ) Scratch memory requests that are larger than one dword are using a different memory layout than global instructions. Rather than being placed contiguously, each dword is interleaved 64 lanes * 4 bytes away as described in Section 9.1.5.2. "Swizzled Buffer Addressing" in the MI300 specification. This was verified by comparing MI300 output (which uses scratch_ instructions) with MI200 (which uses buffer instructions). MI300 FashionMNIST bs=1 now matches CPU reference. This requires several changes to the instruction implementations: - For stores, data in the GPUDynInst can be swizzled before the data is written to memory. This is easy to do using a helper method. This is done in the template<int N> variant of initMemWrite. To use this x2 stores are changed to use template<int N> rather than loading a U64. The swizzle function is renamed to swizzleAddr to avoid confusion with swizzleData. - For loads, data is unswizzled in completeAcc when writing register values. This is not as easy to implement as a helper and is thus implemented for the three load instructions that load more than one dword. - Accessing swizzled data requires at least one packet per dword. A new GPU memory helper is added to create these packets for scratch requests specifically. This is called in the template<int N> variant of initMemRead / initMemWrite. Loads and stores of x2 are changed to use this variant instead of accessing a U64. The GPUDynInst status vector restrictions are increased to allow for swizzled x4 accesses. For simplicity this does not currently support misaligned swizzled accesses and will panic upon seeing such a case. Change-Id: Ic686c51e28e0af029a043d5a5b3d4069f2cb94f9	2024-08-12 06:58:48 -07:00
Matthew Poremba	62a2c09d4b	arch-vega: Rework rounding for microscaling conversions The current implementation does not correctly convert subnormal numbers (number that fill the underflow gap around zero in floating-point arithmetic). This commit reworks the rounding code to get correct results. First, the min_exp is set to 0 which allows for numbers to become subnormal when rounding. Second, the rounding code now uses something closer to "GRS" rounding (guard, round, sticky) which represent the first bit removed when rounding to a smaller type, the next second bit removed, and whether any of the other bits removed are one. More details can be found in the code comments. Change-Id: Idcd2f1e4383e4012fc3abf73b1f73c847d44f67b	2024-08-10 10:23:07 -07:00
Matthew Poremba	bdba981753	arch-vega: Preserve sign of NaN/Inf for microscaling types The implementation of microscaling formats uses the Open Compute Project specification which includes a sign bit for NaN and infinity. This should be preserved when a conversion results in NaN or infinity. Change-Id: Id9e99324c6486e256c699016aff301d5f06814d5	2024-08-10 10:23:07 -07:00
Matthew Poremba	c1251f51c1	arch-vega: Introduce two scaling methods for microscaling types Currently there is only a scale() method which multiplies a microscaling type by an int8 value. This should only be applied when upcasting to a larger type after conversion to match hardware. When downcasting to a smaller type, the scaling method should divide by the int8 value before conversion. This commit adds both scaling methods. Change-Id: Ibafa8caa389cde4df609e536cd53bd2289959420	2024-08-10 10:23:07 -07:00
Robert Hauser	e980780efd	arch-riscv: Extend wfi behavior (#1364 ) At the moment, a hart does not halt if there are pending interrupts. However, an implementation can also consider the enable status of the individual interrupts, i.e., a halted hart would only resume if there are locally enabled pending interrupts. This commit introduces this behavior. The wfi behavior is controlled by the new configuration variable wfi_pending_resume of RiscvISA. Change-Id: I316239f9732c6e73e6ad692491bca08d773dd995 --------- Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>	2024-08-09 11:28:15 -07:00
Marleson Graf	b8001a861b	mem-ruby,sim-se: Clear LL/SC locks after functional writes (#1404 ) Functional writes atomically update all copies of a data block, so they should invalidate any pending LL/SC locks, just like a conventional write would. Change-Id: Ic79d2d8d24901f1b6a2ce81dc0e2decc84c0ebbc	2024-08-09 09:30:37 -07:00
MMysore2	33e3bc4ff1	Updating Traffic Generators (#1416 ) Added documentation for `strided_generator.py` and `strided_generator_core.py.` Updated clarity of documentation for `linear_generator.py`, `linear_generator_core.py`, `random_generator.py`, and `random_generator_core.py`. Made `max_addr` exclusive instead of inclusive for strided and linear traffic generation in `strided_gen.cc` and `linear_gen.cc`.	2024-08-08 12:46:10 -07:00
Matthew Poremba	85c48a36ec	dev-amdgpu: Fix issues found by address sanitizer (#1430 ) These commits primarily fix the SDMA engine which was (1) using pointer arithmetic on a variable returned by new and then attempting to free the modified pointer and (2) using a buffer after it was freed due to the DMA device calling completion event before Ruby actually completed. Some minor fixes are included: Stop using uninitialized value as packet context and using same request pointer for two separate packets for GPU invalidations.	2024-08-08 11:14:50 -07:00
Yangyu Chen	ce07203c5f	arch-riscv: use sign-extend for all address generation (#1316 ) In gem5, we use the same code base for RISC-V 32 and 64. However, if we need to allow modifiable XLEN control on CSR.mstatus in the future, we should follow the RISC-V ISA manual to sign-extend all the register results, including PC and GPR. If this feature implemented, the simulator needs to handle user-mode in RV32 but CSR.SATP sets to Sv39. In this case, 0x80000000 and 0xffffffff80000000 are different addresses in the 64-bit S-Mode perspective, but they are the same in the 32-bit U-Mode perspective. We should avoid this wrong behavior happening before we implement this feature. Thus, we need to sign-extend the results of all the addresses, including the PC and memory addresses, which currently use zero-extend. As specified in the RISC-V ISA manual, we use zero-extend in narrow XLEN mode for the physical address implemented in TLB. Changes based on spec: 1. Sign-extend narrow XLEN: https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/machine.adoc?plain=1#L567 2. Zero-extend physical address: https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/supervisor.adoc?plain=1#L1670 Signed-off-by: Yangyu Chen <cyy@cyyself.name>	2024-08-08 08:41:35 -07:00
Matthew Poremba	84fedecafe	gpu-compute: Update Requests for invalidations The SQC and TCC invalidations share a Request pointer which they both modify. This can cause some problems, so use a different request pointer for each invalidate. The setContext call is also removed as the value being assigned to it is uninitialized. Change-Id: I82ea7aa44a4f4515c1560993caa26cc6a89355af	2024-08-07 14:37:49 -07:00
Matthew Poremba	db0d5f19cf	dev-amdgpu: Add cleanup events for SDMA SDMA packets which use dmaVirtWrites call their completion event before the write takes place in the Ruby protocol. This causes a use-after-free issue corruption random memory locations leading to random errors. This commit adds a cleanup event for each packet that uses DMA and sets the cleanup latency as 10000 ticks. In atomic mode, the writes complete exactly 2000 ticks after the completion event is called and therefore a fixed latency can be used. This is not tested with timing mode, which does not work with GPUFS at the moment, so a warning is added to give an idea where to look in case the same issue occurs once timing mode is supported. Change-Id: I9ee2689f2becc46bb7794b18b31205f1606109d8	2024-08-07 14:37:49 -07:00
Matthew Poremba	0d0b68266c	dev-amdgpu: Fix bad free in SDMA The SDMA engine copies data in chunks. It currently uses the pointer returned from new[] and manipulates it using pointer arithmetic. This modified pointer is then passed to the completion function which deletes the pointer. Since it is not the original pointer allocated by new[] this triggers issues in ASAN. Change-Id: I03ccf026633285e75005509445c62fcbda8eb978	2024-08-07 12:54:45 -07:00
Saili Karkare	bd228af5cf	Updating hex addr printing (#1385 ) This change changes the addresses that are printed when TrafficGen DebugFlag is enabled. Previously, hex strings were printed without a preceding 0x. This change fixes that to distinguish between decimal and hex.	2024-08-07 02:31:21 -07:00
Erin Le	6dbe2bca7b	mem: Add constexprs to spatio_temporal_memory_streaming.cc Change-Id: I6fa3d9f9a9d89d59d9ec1fc97c152bea3059f87d	2024-08-06 00:06:38 +00:00
Erin Le	f325949ba5	mem: remove stray comment from signature_path_v2.cc Change-Id: I5ddd2ddd6a9cb4fb032b48870c5ef6b0dc9533c0	2024-08-05 23:10:10 +00:00
Erin Le	2db021b27b	mem: Comment removal and adding constexpr to is_secure bools This commit removes some comments and adds constexpr in front of "bool is_secure..." in pif.cc, signature_path.cc, and signature_path_v2.cc Change-Id: Icafe1d7c97d1d3fbf6abc12ba87ebb596255b96f	2024-08-05 15:43:40 -07:00
Erin Le	9adf44ed1f	mem: use is_secure instead of hardcoded false in prefetcher crash This modifies the crash fix so that the function calls that were modified use a local variables called `is_secure` instead of a hardcoded `false`. Some of these existed previously so it made more sense to use them, while others were newly added in to mark where the code might need to be changed later. Change-Id: I0c0d14b74f0ccf70ee5fe7c8b01ed0266353b3c1	2024-08-05 15:43:40 -07:00
Erin Le	b0756bedba	mem: Fix "Need is_secure arg" prefetcher crash This commit fixes the "Need is_secure arg" crash that occurs when using the IndirectMemoryPrefetcher, SignaturePathPrefetcher, SignaturePathPrefetcherV2, STeMSPrefetcher, and PIFPrefetcher. This was done by changing some variables to be AssociativeSet<...> instead of AssociativeCache<...> and changing the affected function calls. Change-Id: I61808c877514efeb73ad041de273ae386711acae	2024-08-05 15:43:40 -07:00
Yu-Cheng Chang	5df08fdb08	arch-riscv: Move pmpReset implementation to MMU::reset() (#1406 ) The PMP is part of RISC-V MMU subssystem, it should be put in RiscvISA::MMU::reset()	2024-08-05 14:21:48 -07:00
Matt Sinclair	edd73bd330	gpu-compute: fix typo in GPUMem debug print (#1412 ) The GPUMem print for when a memstatus request completes accidentally put a newline before the word "complete", causing complete to print on a newline and cause confusion. This commit resolves that.	2024-08-05 12:44:13 -07:00
Matt Sinclair	ba455e2025	gpu-compute: update GPUKernelInfo print to print WG number (#1413 ) Whenever a GPU kernel is launching a new WG, the GPUKernelInfo debug flag will print that the kernel is being launched, without the context of which WG from that kernel is being launched. This has caused some confusion to users, who think the entire kernel is being launched repeatedly. To resolve this confusion, update this print to make it clear which WG is being launched when this print is enabled.	2024-08-05 12:43:41 -07:00
Giacomo Travaglini	d2c8754ab3	mem: Fix name() helper for DRAM rank (#1410 ) At the moment the method simply returns the rank number. This is not particularly useful when enabling debug flags as the beginning of the line prints something like: 1: <debug_message> whereas it should really be: system.dram.rank1: <debug_message> Change-Id: I0136dc3d182afa4ae2e5a719cb366d8d0f444667 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-08-03 22:49:59 +01:00
Alexander Richardson	267817eaa1	arch-riscv: Fix implicit int-to-float conversion in .isa files (#1319 ) Explicitly convert to float/double to fix compiler warnings that I have turned on locally.	2024-07-31 04:24:54 -07:00
Erin (Jianghua) Le	2bfafa726f	sim: Add error message for kernel exceeding memory size (#1329 ) This commit adds an error message to src/sim/kernel_workload.cc to tell the user when the end address of the kernel is greater than the size of memory. The error message also specifies the minimum memory size needed to fit the kernel. Change-Id: I7d8f50889ed8172f64b84f98301a35e5f2f352d3	2024-07-30 19:39:41 -07:00
Yu-Cheng Chang	c13f895af0	arch,cpu: Implement generic reset method for MMU (#1342 ) Implementing generic reset method for MMU allows each ISA implementing their own reset methods. The default reset MMU method is flush all TLB entries. For example, The RISC-V needs to do PMP reset when received the reset signal, but the TLBs don't require to be flushed. Change-Id: I158261570fb6e5216ec105fbdc53460f83f88d15	2024-07-30 09:47:55 +01:00
Alexander Richardson	b64aa0b9b3	arch: Dump semihosting write buffer in debug output (#1389 ) This makes it easier to debug unexpected semihosting outputs (in my case a wrong buffer argument was being passed). Change-Id: I342610a92fb8efe121d030f7b9ea3307efc4fec3	2024-07-30 09:39:05 +01:00
Alexander Richardson	b23a4c7806	arch-arm: Add support for AArch32 PMEVCNTR/PMEVTYPER/PMCCFILTR (#1388 ) These registers were only handled in AArch64 mode but are also accessible as a c14 registers for AArch32. Change-Id: I62fe54427e96265df0589308afa1b5d665dbf210	2024-07-29 18:22:00 +01:00
Alexander Richardson	b51927e7a8	arch-arm: return 64-bit cycle counter for MISCREG_PMCCNTR (#1390 ) In AArch32 mode it is possible to read a 64-bit counter using mrrc. Instead of truncating in the PMU code, just allow the instruction implementation to truncate to 32 bits if accessed using mrc. Change-Id: I77620f6d1852a7d9e79c1ecee50f4297b4103b1c	2024-07-29 16:57:48 +01:00
Matthew Poremba	21f6e166b7	arch-vega: Panic on SDWAB / DPP VOPC unimplemented If SDWAB or DPP are used on a VOPC instruction and those are not implemented, it is highly likely to be a problem for the application. Rather than continue to execute and cause undefined behavior, exit the simulation with a panic showing the line of the instruction causing the issue. Change-Id: Ib3f94df7445d068b26907470c1f733be16cd2fc2	2024-07-25 16:18:14 -07:00
Matthew Poremba	b75fe56da5	arch-vega: Panic unimplemented SDWA/DPP for VOP1/VOP2 Add a panic if SDWA or DPP is used for an instruction which does not implement support for it. If an application uses SDWA or DPP it likely does not operate in the same way as the base instruction and therefore gem5 should panic rather than continue. It is likely data is incorrect which will make it more difficult to debug an application. Change-Id: I68ac448b0d62941761ef4efa0169f95796270f48	2024-07-25 16:18:14 -07:00
Matthew Poremba	6558821e2d	arch-vega: Add SDWAB for v_cmp_{eq,ne}_u16 This shows an example of how to use the previous commit which adds an SDWAB helper. The execute() method of both are the same with the exception of the lambda function passed to the helper method. Change-Id: I5ffe361440b4020b9f7669c0ed946aa6b3bbec25	2024-07-25 16:18:14 -07:00
Matthew Poremba	69338703e7	arch-vega: Implement SDWAB helper Implement a SDWAB helper which accepts a dynamic instruction and a lambda function defining a comparison function taking two values and returning a comparison result of 0 or 1 for false or true. Current instructions which implement SDWA do so on a per-instruction basis which adds a lot of redundant code. This allows for generic SDWAB implementations for VOPC instructions. All modifiers are implemented assuming that SDWBA VOPC instruction comparison types may be U32, I32, F32, U16, I16, F16 (which exist) but is extendible to I8, U8, or F8. Change-Id: Idab58a327c29dd19a1a5457237f3799a04f2031b	2024-07-25 16:18:13 -07:00
Matthew Poremba	a7bc4ca19a	arch-vega: Fix unconditional clamps in VOP3 (#1379 ) Some instructions are clamping floating point outputs unconditionally, leading to incorrect results. This commit finds instructions with this issue and checks the clamp bit before applying clamp. Change-Id: Ibc6de3813d81fd4f9d2c98dd497d19dd34cf6bde	2024-07-25 08:06:00 -07:00
Matthew Poremba	7dae1a1d25	arch-vega: Multiple SOPC fixes (#1366 ) Make S_CMP_LT_U32 use < instead of <=. Change types of EQ / LG for U64 to be U64. Change-Id: Ib0b3b7a46ba1aff16a6d439302ca087d988d6417	2024-07-23 12:45:52 -07:00
Ivana Mitrovic	82c91e8edb	arch-riscv: Improve widening/narrowing vectors overlap check (#1331 ) This PR improves the vector register groups overlap check in widening/narrowing instructions. - Fix wrong illegal overlap condition between VS2 and VD vector register groups. - Also check VS1 vector register group for overlap with VD in vector-vector instructions. - Parametrize widening/narrowing factors in overlap check function to potentially handle more cases. Fixes issue #442.	2024-07-22 10:54:02 -07:00
Erin (Jianghua) Le	b6f8ecb1be	python: move cache coherence protocol check above imports (#1360 ) This commit moves the requires() call that checks the cache coherence protocol above the imports. This change was made for the chi private l1, ruby mesi three level, mesi two level, and mi example cache hierarchies. This ensures that a clear error message about having the wrong coherence protocol is printed, rather than a less useful message. Change-Id: I3bac1ffcb1f8a9d94e486237f880cf248e442ba8	2024-07-22 09:34:04 -07:00
Alexander Richardson	fc59109429	arch,arch-arm: Fix remaining implicit float conversion warnings in .isa (#1327 ) This fixes the remaining implicit int/float conversions and enables the float conversion warnings for clang when building the Arm instruction execution logic. This depends on the previous fixes. Change-Id: I51aac94644a483175842c36da2d49d308aaceb49	2024-07-18 10:43:12 -07:00
Erin (Jianghua) Le	aaa6566548	mem: Change long in src/mem/physical.cc to int64_t (#1275 ) This changes `long`s in src/mem/physical.cc, which are 32 bits or more, to `uint64_t`s, which are exactly 64 bits. Change-Id: I64e089a2ac087bcf58b9c3c918c59dc5ff75d010	2024-07-18 10:12:24 -07:00
Robert Hauser	9b8c84cb5d	arch-riscv: Overwrite getEMI() for timing expr (#1346 ) TimingExpression enables runtime calculation of the commit latency in MinorCPU. For this, machInst is obtained by getEMI() to match it with a given instruction. At default, getEMI() always returns 0 and is therefore overwritten to enable timing expressions for RISC-V. This was already done for ARM (see src/arch/arm/insts/static_inst.hh). Change-Id: I03d669b3439fd24e00cbf893f5db9951dfe56b1f Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>	2024-07-12 20:52:24 -07:00
Robert Hauser	5e5e8fb9c6	arch-riscv: Update local interrupts citation (#1347 ) Updated the bib information of the local RISC-V interrupts. Change-Id: I666c3df4529e159bd1946ca1a9623e47f84d5d9e Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>	2024-07-12 20:51:49 -07:00
Tommaso Marinelli	e3b41291da	arch-riscv: Check VS1 group for overlap when widening/narrowing Currently, only the VS2 register group is checked for overlap with VD when executing a widening/narrowing instruction. This commits extends the check to VS1, when applicable (i.e. vector-vector operations). Change-Id: I892b7717c01e25546fb41e05afbd08fc40c60c59	2024-07-12 01:17:14 +00:00
Tommaso Marinelli	a8b7e9727d	arch-riscv: Generalize widening/narrowing vectors overlap check As of now, the widening/narrowing vector register groups overlap check always assumes a SEW multiplication factor equal to 2 (for either VD or VS2). This commits aims at making this check more generic. Change-Id: I4311fc3624cd324ccfdf2a1920a19efc85357120	2024-07-12 01:17:14 +00:00
Tommaso Marinelli	5b693fd8b6	arch-riscv: Remove duplicate line Change-Id: I32200aad5a59c9fd85f6ed783a4cebb841bf6ff1	2024-07-12 01:17:14 +00:00

1 2 3 4 5 ...

15288 Commits