gpu-compute: Implement per-request MTYPEs
GPU MTYPE is currently set using a global config passed to the PACoalescer. This patch enables MTYPE to be set by the shader on a per-request bases. In real hardware, the MTYPE is extracted from a GPUVM PTE during address translation. However, our current simulator only models x86 page tables which do not have the appropriate bits for GPU MTYPES. Rather than hacking non-x86 bits into our x86 page table models, this patch instead keeps an interval tree of all pages that request custom MTYPES in the driver itself. This is currently only used to map host pages to the GPU as uncacheable, but is easily extensible to other MTYPES. Change-Id: I7daab0ffae42084b9131a67c85cd0aa4bbbfc8d6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42216 Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>
This commit is contained in:
committed by
Matthew Poremba
parent
dfa712f041
commit
ad43083bb3
@@ -173,6 +173,21 @@ parser.add_argument("--dgpu", action="store_true", default=False,
|
||||
"transfered from host to device memory using runtime calls "
|
||||
"that copy data over a PCIe-like IO bus.")
|
||||
|
||||
# Mtype option
|
||||
#-- 1 1 1 C_RW_S (Cached-ReadWrite-Shared)
|
||||
#-- 1 1 0 C_RW_US (Cached-ReadWrite-Unshared)
|
||||
#-- 1 0 1 C_RO_S (Cached-ReadOnly-Shared)
|
||||
#-- 1 0 0 C_RO_US (Cached-ReadOnly-Unshared)
|
||||
#-- 0 1 x UC_L2 (Uncached_GL2)
|
||||
#-- 0 0 x UC_All (Uncached_All_Load)
|
||||
# default value: 5/C_RO_S (only allow caching in GL2 for read. Shared)
|
||||
parser.add_argument("--m-type", type='int', default=5,
|
||||
help="Default Mtype for GPU memory accesses. This is the "
|
||||
"value used for all memory accesses on an APU and is the "
|
||||
"default mode for dGPU unless explicitly overwritten by "
|
||||
"the driver on a per-page basis. Valid values are "
|
||||
"between 0-7")
|
||||
|
||||
Ruby.define_options(parser)
|
||||
|
||||
# add TLB options to the parser
|
||||
@@ -407,8 +422,15 @@ hsapp_gpu_map_vaddr = 0x200000000
|
||||
hsapp_gpu_map_size = 0x1000
|
||||
hsapp_gpu_map_paddr = int(Addr(args.mem_size))
|
||||
|
||||
if args.dgpu:
|
||||
# Default --m-type for dGPU is write-back gl2 with system coherence
|
||||
# (coherence at the level of the system directory between other dGPUs and
|
||||
# CPUs) managed by kernel boundary flush operations targeting the gl2.
|
||||
args.m_type = 6
|
||||
|
||||
# HSA kernel mode driver
|
||||
gpu_driver = GPUComputeDriver(filename = "kfd", isdGPU = args.dgpu)
|
||||
gpu_driver = GPUComputeDriver(filename = "kfd", isdGPU = args.dgpu,
|
||||
dGPUPoolID = 1, m_type = args.m_type)
|
||||
|
||||
# Creating the GPU kernel launching components: that is the HSA
|
||||
# packet processor (HSAPP), GPU command processor (CP), and the
|
||||
|
||||
Reference in New Issue
Block a user