Commit Graph

15288 Commits

Author SHA1 Message Date
Yangyu Chen
b0d81ec8a2 arch-riscv: fix GDB breakpoint issue for RV32 (#1470)
Since PR #1316, we use sign-extend for all address generation, including
PC, to match the ISA specification for modifiable XLEN. However, when we
set a breakpoint using remote GDB, our address is not sign-extended.
This causes the breakpoint to be set at the wrong address, as specified
in Issue #1463. This PR fixes the issue by sign-extending the address
when setting a breakpoint. This also matches the RISC-V ISA
Specification that "must sign-extend results to fill the entire widest
supported XLEN in the destination register."

Change-Id: I9b493bf8ad5b1ef45a9728bb40fc5e38250fe9c3

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
2024-08-19 10:25:39 -07:00
Yu-Cheng Chang
aa4fe362a5 arch-riscv: Sign-extend the address in newPCState (#1471)
From #1316, creating the new PCState should sign-extend the address to
avoid wrong address issue.

Change-Id: I884b4e3708f5f1cc49cfd44d51bec5a2b63cc47a
2024-08-19 08:21:42 -07:00
Giacomo Travaglini
280871245b arch-arm: Redirect VHE for ZCR_EL1 (#1472)
Change-Id: Iff83d25257065503dc02728461823bc9985dbab3

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-08-16 22:49:49 +01:00
Alexander Richardson
646f994efb arch-arm: Fix incorrect operation of VRINT* instructions (#1325)
After a lot of debugging and comparing traces I noticed that vrintp was
giving different results from QEMU. An input of 0x3f800000 (1.0) was
being passed to the fplib helpers as (uint32_t)1 which has a completely
different floating-point interpretation and the result was therefore
completely wrong.

I've fixed this as well as all remaining implicit float-to-int
conversions in the ARM instruction execution. There are more
-W(implicit-)float-conversion warnings in the other executors, but for
now this fixes the issue I was seeing.

Change-Id: Ifdeee745ca155d7f4504ac4c54235ac431acdeb9
2024-08-15 11:01:48 +01:00
Setu
629bf84e10 mem: Stride Prefetcher Fix (#1449)
This PR fixes the issues mentioned in #1448.

**Note that this contribution is the result of a joint collaboration
with @AbhishekUoR**

This PR introduces the following 4 changes:
1. It changes the addresses which are used to compute the stride to
cache line aligned addresses (the current version uses word aligned
addresses)
2. It correctly returns if the stride does not match (as opposed to
issuing prefetches using the new stride incorrectly)
3. It returns if the new stride is 0, indicating multiple reads from the
same cache line.
4. It removes code which is no longer necessary after the addition of
changes number 1 and 3.

Change-Id: Ic346d0e15df6d07e2b93289c8d6b89b4c2f45a34

---------

Co-authored-by: Abhishek Shailendra Singh <abs218@leigh.edu>
2024-08-14 07:16:10 -07:00
Alexander Richardson
f6f547fb62 arch-arm: Fix incorrect behaviour of VFNMS and VFNMA (#1420)
This was found while comparing a diverging execution against QEMU traces
and checking for the first mismatched program counter. Fortunately this
was
caused by a branch shortly after this incorrect computation but still
took
a long time to track down.

There are two issues here: the decoder had inverted the cases for *S and
*A,
and the sign bit was wrong for VFN*.
2024-08-13 09:05:52 +01:00
Matthew Poremba
c359b53a19 arch-vega: Update microscaling format scaling and denorm handling (#1451)
This PR has 3 commits:
- Update scaling methods to scale by multiplication or division when
upcasting or downcasting respectively.
- Preserve the sign when a microscaling conversion results in NaN or
infinity to match hardware.
- Rework rounding to handle cases where conversion results in a denormal
number in the output type so that the value is correct.
2024-08-12 07:00:26 -07:00
Matthew Poremba
7d46c50663 arch-vega: Swizzle multi-dword scratch requests (#1445)
Scratch memory requests that are larger than one dword are using a
different memory layout than global instructions. Rather than being
placed contiguously, each dword is interleaved 64 lanes * 4 bytes away
as described in Section 9.1.5.2. "Swizzled Buffer Addressing" in the
MI300 specification. This was verified by comparing MI300 output (which
uses scratch_ instructions) with MI200 (which uses buffer instructions).
MI300 FashionMNIST bs=1 now matches CPU reference.

This requires several changes to the instruction implementations:
- For stores, data in the GPUDynInst can be swizzled before the data is
written to memory. This is easy to do using a helper method. This is
done in the template<int N> variant of initMemWrite. To use this x2
stores are changed to use template<int N> rather than loading a U64. The
swizzle function is renamed to swizzleAddr to avoid confusion with
swizzleData.
- For loads, data is unswizzled in completeAcc when writing register
values. This is not as easy to implement as a helper and is thus
implemented for the three load instructions that load more than one
dword.
- Accessing swizzled data requires at least one packet per dword. A new
GPU memory helper is added to create these packets for scratch requests
specifically. This is called in the template<int N> variant of
initMemRead / initMemWrite. Loads and stores of x2 are changed to use
this variant instead of accessing a U64.

The GPUDynInst status vector restrictions are increased to allow for
swizzled x4 accesses. For simplicity this does not currently support
misaligned swizzled accesses and will panic upon seeing such a case.

Change-Id: Ic686c51e28e0af029a043d5a5b3d4069f2cb94f9
2024-08-12 06:58:48 -07:00
Matthew Poremba
62a2c09d4b arch-vega: Rework rounding for microscaling conversions
The current implementation does not correctly convert subnormal numbers
(number that fill the underflow gap around zero in floating-point
arithmetic). This commit reworks the rounding code to get correct
results.

First, the min_exp is set to 0 which allows for numbers to become
subnormal when rounding. Second, the rounding code now uses something
closer to "GRS" rounding (guard, round, sticky) which represent the
first bit removed when rounding to a smaller type, the next second bit
removed, and whether any of the other bits removed are one. More details
can be found in the code comments.

Change-Id: Idcd2f1e4383e4012fc3abf73b1f73c847d44f67b
2024-08-10 10:23:07 -07:00
Matthew Poremba
bdba981753 arch-vega: Preserve sign of NaN/Inf for microscaling types
The implementation of microscaling formats uses the Open Compute Project
specification which includes a sign bit for NaN and infinity. This
should be preserved when a conversion results in NaN or infinity.

Change-Id: Id9e99324c6486e256c699016aff301d5f06814d5
2024-08-10 10:23:07 -07:00
Matthew Poremba
c1251f51c1 arch-vega: Introduce two scaling methods for microscaling types
Currently there is only a scale() method which multiplies a microscaling
type by an int8 value. This should only be applied when upcasting to
a larger type after conversion to match hardware. When downcasting to a
smaller type, the scaling method should divide by the int8 value before
conversion.

This commit adds both scaling methods.

Change-Id: Ibafa8caa389cde4df609e536cd53bd2289959420
2024-08-10 10:23:07 -07:00
Robert Hauser
e980780efd arch-riscv: Extend wfi behavior (#1364)
At the moment, a hart does not halt if there are pending interrupts.
However, an implementation can also consider the enable status of the
individual interrupts, i.e., a halted hart would only resume if there
are locally enabled pending interrupts. This commit introduces this
behavior. The wfi behavior is controlled by the new configuration
variable wfi_pending_resume of RiscvISA.

Change-Id: I316239f9732c6e73e6ad692491bca08d773dd995

---------

Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
2024-08-09 11:28:15 -07:00
Marleson Graf
b8001a861b mem-ruby,sim-se: Clear LL/SC locks after functional writes (#1404)
Functional writes atomically update all copies of a data block, so they
should invalidate any pending LL/SC locks, just like a conventional
write would.

Change-Id: Ic79d2d8d24901f1b6a2ce81dc0e2decc84c0ebbc
2024-08-09 09:30:37 -07:00
MMysore2
33e3bc4ff1 Updating Traffic Generators (#1416)
Added documentation for `strided_generator.py` and
`strided_generator_core.py.`

Updated clarity of documentation for `linear_generator.py`,
`linear_generator_core.py`, `random_generator.py`, and
`random_generator_core.py`.

Made `max_addr` exclusive instead of inclusive for strided and linear
traffic generation in `strided_gen.cc` and `linear_gen.cc`.
2024-08-08 12:46:10 -07:00
Matthew Poremba
85c48a36ec dev-amdgpu: Fix issues found by address sanitizer (#1430)
These commits primarily fix the SDMA engine which was (1) using pointer
arithmetic on a variable returned by new and then attempting to free the
modified pointer and (2) using a buffer after it was freed due to the
DMA device calling completion event before Ruby actually completed.

Some minor fixes are included: Stop using uninitialized value as packet
context and using same request pointer for two separate packets for GPU
invalidations.
2024-08-08 11:14:50 -07:00
Yangyu Chen
ce07203c5f arch-riscv: use sign-extend for all address generation (#1316)
In gem5, we use the same code base for RISC-V 32 and 64.

However, if we need to allow modifiable XLEN control on CSR.mstatus in
the future, we should follow the RISC-V ISA manual to sign-extend all
the register results, including PC and GPR. If this feature implemented,
the simulator needs to handle user-mode in RV32 but CSR.SATP sets to
Sv39. In this case, 0x80000000 and 0xffffffff80000000 are different
addresses in the 64-bit S-Mode perspective, but they are the same in the
32-bit U-Mode perspective. We should avoid this wrong behavior happening
before we implement this feature.

Thus, we need to sign-extend the results of all the addresses, including
the PC and memory addresses, which currently use zero-extend. As
specified in the RISC-V ISA manual, we use zero-extend in narrow XLEN
mode for the physical address implemented in TLB.

Changes based on spec:
1. Sign-extend narrow XLEN:
https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/machine.adoc?plain=1#L567
2. Zero-extend physical address:
https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/supervisor.adoc?plain=1#L1670

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
2024-08-08 08:41:35 -07:00
Matthew Poremba
84fedecafe gpu-compute: Update Requests for invalidations
The SQC and TCC invalidations share a Request pointer which they both
modify. This can cause some problems, so use a different request pointer
for each invalidate. The setContext call is also removed as the value
being assigned to it is uninitialized.

Change-Id: I82ea7aa44a4f4515c1560993caa26cc6a89355af
2024-08-07 14:37:49 -07:00
Matthew Poremba
db0d5f19cf dev-amdgpu: Add cleanup events for SDMA
SDMA packets which use dmaVirtWrites call their completion event before
the write takes place in the Ruby protocol. This causes a use-after-free
issue corruption random memory locations leading to random errors. This
commit adds a cleanup event for each packet that uses DMA and sets the
cleanup latency as 10000 ticks. In atomic mode, the writes complete
exactly 2000 ticks after the completion event is called and therefore a
fixed latency can be used. This is not tested with timing mode, which
does not work with GPUFS at the moment, so a warning is added to give an
idea where to look in case the same issue occurs once timing mode is
supported.

Change-Id: I9ee2689f2becc46bb7794b18b31205f1606109d8
2024-08-07 14:37:49 -07:00
Matthew Poremba
0d0b68266c dev-amdgpu: Fix bad free in SDMA
The SDMA engine copies data in chunks. It currently uses the pointer
returned from new[] and manipulates it using pointer arithmetic. This
modified pointer is then passed to the completion function which deletes
the pointer. Since it is not the original pointer allocated by new[]
this triggers issues in ASAN.

Change-Id: I03ccf026633285e75005509445c62fcbda8eb978
2024-08-07 12:54:45 -07:00
Saili Karkare
bd228af5cf Updating hex addr printing (#1385)
This change changes the addresses that are printed when TrafficGen
DebugFlag is enabled. Previously, hex strings were printed without a
preceding 0x. This change fixes that to distinguish between decimal and
hex.
2024-08-07 02:31:21 -07:00
Erin Le
6dbe2bca7b mem: Add constexprs to spatio_temporal_memory_streaming.cc
Change-Id: I6fa3d9f9a9d89d59d9ec1fc97c152bea3059f87d
2024-08-06 00:06:38 +00:00
Erin Le
f325949ba5 mem: remove stray comment from signature_path_v2.cc
Change-Id: I5ddd2ddd6a9cb4fb032b48870c5ef6b0dc9533c0
2024-08-05 23:10:10 +00:00
Erin Le
2db021b27b mem: Comment removal and adding constexpr to is_secure bools
This commit removes some comments and adds constexpr in front
of "bool is_secure..." in pif.cc, signature_path.cc, and
signature_path_v2.cc

Change-Id: Icafe1d7c97d1d3fbf6abc12ba87ebb596255b96f
2024-08-05 15:43:40 -07:00
Erin Le
9adf44ed1f mem: use is_secure instead of hardcoded false in prefetcher crash
This modifies the crash fix so that the function calls that were
modified use a local variables called `is_secure` instead of a
hardcoded `false`. Some of these existed previously so it made
more sense to use them, while others were newly added in to mark
where the code might need to be changed later.

Change-Id: I0c0d14b74f0ccf70ee5fe7c8b01ed0266353b3c1
2024-08-05 15:43:40 -07:00
Erin Le
b0756bedba mem: Fix "Need is_secure arg" prefetcher crash
This commit fixes the "Need is_secure arg" crash that occurs when
using the IndirectMemoryPrefetcher, SignaturePathPrefetcher,
SignaturePathPrefetcherV2, STeMSPrefetcher, and PIFPrefetcher. This
was done by changing some variables to be AssociativeSet<...>
instead of AssociativeCache<...> and changing the affected function
calls.

Change-Id: I61808c877514efeb73ad041de273ae386711acae
2024-08-05 15:43:40 -07:00
Yu-Cheng Chang
5df08fdb08 arch-riscv: Move pmpReset implementation to MMU::reset() (#1406)
The PMP is part of RISC-V MMU subssystem, it should be put in
RiscvISA::MMU::reset()
2024-08-05 14:21:48 -07:00
Matt Sinclair
edd73bd330 gpu-compute: fix typo in GPUMem debug print (#1412)
The GPUMem print for when a memstatus request completes accidentally put
a newline before the word "complete", causing complete to print on a
newline and cause confusion. This commit resolves that.
2024-08-05 12:44:13 -07:00
Matt Sinclair
ba455e2025 gpu-compute: update GPUKernelInfo print to print WG number (#1413)
Whenever a GPU kernel is launching a new WG, the GPUKernelInfo debug
flag will print that the kernel is being launched, without the context
of which WG from that kernel is being launched. This has caused some
confusion to users, who think the entire kernel is being launched
repeatedly. To resolve this confusion, update this print to make it
clear which WG is being launched when this print is enabled.
2024-08-05 12:43:41 -07:00
Giacomo Travaglini
d2c8754ab3 mem: Fix name() helper for DRAM rank (#1410)
At the moment the method simply returns the rank number. This is not
particularly useful when enabling debug flags as the beginning of the
line prints something like:

1: <debug_message>

whereas it should really be:

system.dram.rank1: <debug_message>

Change-Id: I0136dc3d182afa4ae2e5a719cb366d8d0f444667

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-08-03 22:49:59 +01:00
Alexander Richardson
267817eaa1 arch-riscv: Fix implicit int-to-float conversion in .isa files (#1319)
Explicitly convert to float/double to fix compiler warnings that I have
turned on locally.
2024-07-31 04:24:54 -07:00
Erin (Jianghua) Le
2bfafa726f sim: Add error message for kernel exceeding memory size (#1329)
This commit adds an error message to src/sim/kernel_workload.cc to tell
the user when the end address of the kernel is greater than the size of
memory. The error message also specifies the minimum memory size needed
to fit the kernel.

Change-Id: I7d8f50889ed8172f64b84f98301a35e5f2f352d3
2024-07-30 19:39:41 -07:00
Yu-Cheng Chang
c13f895af0 arch,cpu: Implement generic reset method for MMU (#1342)
Implementing generic reset method for MMU allows each ISA implementing
their own reset methods. The default reset MMU method is flush all TLB
entries. For example, The RISC-V needs to do PMP reset when received the
reset signal, but the TLBs don't require to be flushed.

Change-Id: I158261570fb6e5216ec105fbdc53460f83f88d15
2024-07-30 09:47:55 +01:00
Alexander Richardson
b64aa0b9b3 arch: Dump semihosting write buffer in debug output (#1389)
This makes it easier to debug unexpected semihosting outputs (in my case
a wrong buffer argument was being passed).

Change-Id: I342610a92fb8efe121d030f7b9ea3307efc4fec3
2024-07-30 09:39:05 +01:00
Alexander Richardson
b23a4c7806 arch-arm: Add support for AArch32 PMEVCNTR*/PMEVTYPER*/PMCCFILTR (#1388)
These registers were only handled in AArch64 mode but are also
accessible as a c14 registers for AArch32.

Change-Id: I62fe54427e96265df0589308afa1b5d665dbf210
2024-07-29 18:22:00 +01:00
Alexander Richardson
b51927e7a8 arch-arm: return 64-bit cycle counter for MISCREG_PMCCNTR (#1390)
In AArch32 mode it is possible to read a 64-bit counter using mrrc.
Instead of truncating in the PMU code, just allow the instruction
implementation to truncate to 32 bits if accessed using mrc.

Change-Id: I77620f6d1852a7d9e79c1ecee50f4297b4103b1c
2024-07-29 16:57:48 +01:00
Matthew Poremba
21f6e166b7 arch-vega: Panic on SDWAB / DPP VOPC unimplemented
If SDWAB or DPP are used on a VOPC instruction and those are not
implemented, it is highly likely to be a problem for the application.
Rather than continue to execute and cause undefined behavior, exit the
simulation with a panic showing the line of the instruction causing the
issue.

Change-Id: Ib3f94df7445d068b26907470c1f733be16cd2fc2
2024-07-25 16:18:14 -07:00
Matthew Poremba
b75fe56da5 arch-vega: Panic unimplemented SDWA/DPP for VOP1/VOP2
Add a panic if SDWA or DPP is used for an instruction which does not
implement support for it. If an application uses SDWA or DPP it likely
does not operate in the same way as the base instruction and therefore
gem5 should panic rather than continue. It is likely data is incorrect
which will make it more difficult to debug an application.

Change-Id: I68ac448b0d62941761ef4efa0169f95796270f48
2024-07-25 16:18:14 -07:00
Matthew Poremba
6558821e2d arch-vega: Add SDWAB for v_cmp_{eq,ne}_u16
This shows an example of how to use the previous commit which adds an
SDWAB helper. The execute() method of both are the same with the
exception of the lambda function passed to the helper method.

Change-Id: I5ffe361440b4020b9f7669c0ed946aa6b3bbec25
2024-07-25 16:18:14 -07:00
Matthew Poremba
69338703e7 arch-vega: Implement SDWAB helper
Implement a SDWAB helper which accepts a dynamic instruction and a
lambda function defining a comparison function taking two values and
returning a comparison result of 0 or 1 for false or true.

Current instructions which implement SDWA do so on a per-instruction
basis which adds a lot of redundant code. This allows for generic SDWAB
implementations for VOPC instructions.

All modifiers are implemented assuming that SDWBA VOPC instruction
comparison types may be U32, I32, F32, U16, I16, F16 (which exist) but
is extendible to I8, U8, or F8.

Change-Id: Idab58a327c29dd19a1a5457237f3799a04f2031b
2024-07-25 16:18:13 -07:00
Matthew Poremba
a7bc4ca19a arch-vega: Fix unconditional clamps in VOP3 (#1379)
Some instructions are clamping floating point outputs unconditionally,
leading to incorrect results. This commit finds instructions with this
issue and checks the clamp bit before applying clamp.

Change-Id: Ibc6de3813d81fd4f9d2c98dd497d19dd34cf6bde
2024-07-25 08:06:00 -07:00
Matthew Poremba
7dae1a1d25 arch-vega: Multiple SOPC fixes (#1366)
Make S_CMP_LT_U32 use < instead of <=. Change types of EQ / LG for U64
to be U64.

Change-Id: Ib0b3b7a46ba1aff16a6d439302ca087d988d6417
2024-07-23 12:45:52 -07:00
Ivana Mitrovic
82c91e8edb arch-riscv: Improve widening/narrowing vectors overlap check (#1331)
This PR improves the vector register groups overlap check in
widening/narrowing
instructions.

- Fix wrong illegal overlap condition between VS2 and VD vector register
groups.
- Also check VS1 vector register group for overlap with VD in
vector-vector
instructions.
- Parametrize widening/narrowing factors in overlap check function to
potentially
handle more cases.

Fixes issue #442.
2024-07-22 10:54:02 -07:00
Erin (Jianghua) Le
b6f8ecb1be python: move cache coherence protocol check above imports (#1360)
This commit moves the requires() call that checks the cache coherence
protocol above the imports. This change was made for the chi private l1,
ruby mesi three level, mesi two level, and mi example cache hierarchies.
This ensures that a clear error message about having the wrong coherence
protocol is printed, rather than a less useful message.

Change-Id: I3bac1ffcb1f8a9d94e486237f880cf248e442ba8
2024-07-22 09:34:04 -07:00
Alexander Richardson
fc59109429 arch,arch-arm: Fix remaining implicit float conversion warnings in .isa (#1327)
This fixes the remaining implicit int/float conversions and enables the
float conversion warnings for clang when building the Arm instruction
execution logic. This depends on the previous fixes.

Change-Id: I51aac94644a483175842c36da2d49d308aaceb49
2024-07-18 10:43:12 -07:00
Erin (Jianghua) Le
aaa6566548 mem: Change long in src/mem/physical.cc to int64_t (#1275)
This changes `long`s in src/mem/physical.cc, which are 32 bits or more,
to `uint64_t`s, which are exactly 64 bits.

Change-Id: I64e089a2ac087bcf58b9c3c918c59dc5ff75d010
2024-07-18 10:12:24 -07:00
Robert Hauser
9b8c84cb5d arch-riscv: Overwrite getEMI() for timing expr (#1346)
TimingExpression enables runtime calculation of the commit latency in
MinorCPU. For this, machInst is obtained by getEMI() to match it with a
given instruction. At default, getEMI() always returns 0 and is
therefore overwritten to enable timing expressions for RISC-V. This was
already done for ARM (see src/arch/arm/insts/static_inst.hh).

Change-Id: I03d669b3439fd24e00cbf893f5db9951dfe56b1f

Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
2024-07-12 20:52:24 -07:00
Robert Hauser
5e5e8fb9c6 arch-riscv: Update local interrupts citation (#1347)
Updated the bib information of the local RISC-V interrupts.

Change-Id: I666c3df4529e159bd1946ca1a9623e47f84d5d9e

Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
2024-07-12 20:51:49 -07:00
Tommaso Marinelli
e3b41291da arch-riscv: Check VS1 group for overlap when widening/narrowing
Currently, only the VS2 register group is checked for overlap with VD
when executing a widening/narrowing instruction. This commits extends
the check to VS1, when applicable (i.e. vector-vector operations).

Change-Id: I892b7717c01e25546fb41e05afbd08fc40c60c59
2024-07-12 01:17:14 +00:00
Tommaso Marinelli
a8b7e9727d arch-riscv: Generalize widening/narrowing vectors overlap check
As of now, the widening/narrowing vector register groups overlap check
always assumes a SEW multiplication factor equal to 2 (for either VD or
VS2). This commits aims at making this check more generic.

Change-Id: I4311fc3624cd324ccfdf2a1920a19efc85357120
2024-07-12 01:17:14 +00:00
Tommaso Marinelli
5b693fd8b6 arch-riscv: Remove duplicate line
Change-Id: I32200aad5a59c9fd85f6ed783a4cebb841bf6ff1
2024-07-12 01:17:14 +00:00