arch-vega: Fix multi-dword setElem in PackedReg (#1664)

There are two issues related to setting an element in PackedReg where the element spans multiple dwords. First, the mask value is wrong and is clobbering both dwords. Second, a portion of the value is shifted out of the narrower input type. Fix this by using the correct mask to clear the bits where the value will be placed and use a larger data type to shift the value into place.
2024-10-14 10:19:52 -07:00
parent 20965f571b
commit deb8f983a1
1 changed files with 6 additions and 3 deletions
--- a/src/arch/amdgpu/vega/operand.hh
+++ b/src/arch/amdgpu/vega/operand.hh
@@ -960,11 +960,14 @@ class PackedReg
        uint64_t elem_mask = (1ULL << ELEM_SIZE) - 1;
        value &= elem_mask;
        // Clear the bits where the value goes so that operator| can be used.
        elem_mask <<= qw_lbit;
-        qword &= elem_mask;
+        qword &= ~elem_mask;
-        value <<= qw_lbit;
+        // Promote to 64-bit to prevent shifting out of range
-        qword |= value;
+        uint64_t value64 = value;
        value64 <<= qw_lbit;
        qword |= value64;
        dwords[udw] = uint32_t(qword >> 32);
        dwords[ldw] = uint32_t(qword & mask(32));