gpu-compute,arch-vega: Fix ALU-only LDS counters

There are a few LDS instructions that perform local ALU operations and
writeback which are marked as loads. These are marked as loads because
they fit in the pipeline logic better, according to a several year old
comment. In the VEGA ISA these instructions (swizzle, permute, bpermute)
are not decrementing the LDS load counter. As a result, the counter will
gradually increase over time. Since wavefront slots are persistent, this
can cause applications with a few thousand kernels to eventually hang
thinking there are not enough resources.

This changeset fixes this by decrementing the LDS load counter for these
instructions. This fix was already integrated in the GCN3 ISA in the
exact same way. This changeset moves it near a similar comment about
scheduling register file writes.

Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c
This commit is contained in:
Matthew Poremba
2023-08-23 19:21:55 -05:00
parent c218104f52
commit 90a518e885
2 changed files with 22 additions and 0 deletions

View File

@@ -35971,6 +35971,11 @@ namespace VegaISA
*/
wf->computeUnit->vrf[wf->simdId]->
scheduleWriteOperandsFromLoad(wf, gpuDynInst);
/**
* Similarly, this counter could build up over time, even across
* multiple wavefronts, and cause a deadlock.
*/
wf->rdLmReqsInPipe--;
} // execute
// --- Inst_DS__DS_PERMUTE_B32 class methods ---
@@ -36054,6 +36059,11 @@ namespace VegaISA
*/
wf->computeUnit->vrf[wf->simdId]->
scheduleWriteOperandsFromLoad(wf, gpuDynInst);
/**
* Similarly, this counter could build up over time, even across
* multiple wavefronts, and cause a deadlock.
*/
wf->rdLmReqsInPipe--;
} // execute
// --- Inst_DS__DS_BPERMUTE_B32 class methods ---
@@ -36137,6 +36147,11 @@ namespace VegaISA
*/
wf->computeUnit->vrf[wf->simdId]->
scheduleWriteOperandsFromLoad(wf, gpuDynInst);
/**
* Similarly, this counter could build up over time, even across
* multiple wavefronts, and cause a deadlock.
*/
wf->rdLmReqsInPipe--;
} // execute
// --- Inst_DS__DS_ADD_U64 class methods ---

View File

@@ -383,6 +383,13 @@ ComputeUnit::startWavefront(Wavefront *w, int waveId, LdsChunk *ldsChunk,
stats.waveLevelParallelism.sample(activeWaves);
activeWaves++;
panic_if(w->wrGmReqsInPipe, "GM write counter for wavefront non-zero\n");
panic_if(w->rdGmReqsInPipe, "GM read counter for wavefront non-zero\n");
panic_if(w->wrLmReqsInPipe, "LM write counter for wavefront non-zero\n");
panic_if(w->rdLmReqsInPipe, "GM read counter for wavefront non-zero\n");
panic_if(w->outstandingReqs,
"Outstanding reqs counter for wavefront non-zero\n");
}
/**