dev-amdgpu: Better handling for queue remapping

The amdgpu driver can, at *any* time, tell the device to unmap a queue
to force the queue descriptor to be written back to main memory in the
form of a memory queue descriptor (MQD). It will then immediately remap
the queue and continue writing the doorbell to the queue. It is possible
that the doorbell write occurs after the queue is unmapped but before it
is remapped. In this situation, we need to check the updated value of
the doorbell for the queue and write that to the queue after it is
mapped.

To handle this, a pending doorbell packet map is created to hold a
packet to replay when the queue is mapped. Because PCI in gem5
implements only the atomic protocol port, we cannot use the original
packet as it must respond in the same Tick. This patch fixes issues with
the doorbell maps not being cleared on unmapping to ensure the doorbell
is not found in writeDoorbell and places in the pending doorbell map.
This includes fixing the doorbell offset value in the doorbell to VMID
map which was is now multiplied by four as it is a dword address.

This was tested using tensorflow 2.0's MNIST example which was seeing
this issue consistently. With this patch it now makes progress and does
issue pending doorbell writes.

Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e
This commit is contained in:
Matthew Poremba
2023-10-13 14:59:56 -05:00
parent d05433b3f6
commit 37da1c45f3
3 changed files with 41 additions and 2 deletions

View File

@@ -384,7 +384,10 @@ PM4PacketProcessor::mapQueues(PM4Queue *q, PM4MapQueues *pkt)
"Mapping mqd from %p %p (vmid %d - last vmid %d).\n",
addr, pkt->mqdAddr, pkt->vmid, gpuDevice->lastVMID());
gpuDevice->mapDoorbellToVMID(pkt->doorbellOffset,
// The doorbellOffset is a dword address. We shift by two / multiply
// by four to get the byte address to match doorbell addresses in
// the GPU device.
gpuDevice->mapDoorbellToVMID(pkt->doorbellOffset << 2,
gpuDevice->lastVMID());
QueueDesc *mqd = new QueueDesc();
@@ -444,6 +447,8 @@ PM4PacketProcessor::processMQD(PM4MapQueues *pkt, PM4Queue *q, Addr addr,
DPRINTF(PM4PacketProcessor, "PM4 mqd read completed, base %p, mqd %p, "
"hqdAQL %d.\n", mqd->base, mqd->mqdBase, mqd->aql);
gpuDevice->processPendingDoorbells(offset);
}
void
@@ -472,6 +477,8 @@ PM4PacketProcessor::processSDMAMQD(PM4MapQueues *pkt, PM4Queue *q, Addr addr,
// Register doorbell with GPU device
gpuDevice->setSDMAEngine(pkt->doorbellOffset << 2, sdma_eng);
gpuDevice->setDoorbellType(pkt->doorbellOffset << 2, RLC);
gpuDevice->processPendingDoorbells(pkt->doorbellOffset << 2);
}
void
@@ -576,6 +583,7 @@ PM4PacketProcessor::unmapQueues(PM4Queue *q, PM4UnmapQueues *pkt)
gpuDevice->deallocatePasid(pkt->pasid);
break;
case 2:
panic("Unmapping queue selection 2 unimplemented\n");
break;
case 3: {
auto &hsa_pp = gpuDevice->CP()->hsaPacketProc();