dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377)
Earlier, GPU checkpointing was working only if a checkpoint was created before the first kernel execution. This pull request adds support to checkpoint in-between any two kernel calls. It does so by doing the following. - Adds flush support in the GPU_VIPER protocol - Adds flush support in the GPUCoalescer - Updates cache recorder to use the GPUCoalescer during simulation cooldown and cache warmup times.
This commit is contained in:
@@ -158,6 +158,16 @@ def addRunFSOptions(parser):
|
||||
help="Root partition of disk image",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--disable-avx",
|
||||
action="store_true",
|
||||
default=False,
|
||||
help="Disables AVX. AVX is used in some ROCm libraries but "
|
||||
"does not have checkpointing support yet. If simulation either "
|
||||
"creates a checkpoint or restores from one, then AVX needs to "
|
||||
"be disabled for correct functionality ",
|
||||
)
|
||||
|
||||
|
||||
def runGpuFSSystem(args):
|
||||
"""
|
||||
|
||||
@@ -234,7 +234,7 @@ def makeGpuFSSystem(args):
|
||||
# If we are using KVM cpu, enable AVX. AVX is used in some ROCm libraries
|
||||
# such as rocBLAS which is used in higher level libraries like PyTorch.
|
||||
use_avx = False
|
||||
if ObjectList.is_kvm_cpu(TestCPUClass):
|
||||
if ObjectList.is_kvm_cpu(TestCPUClass) and not args.disable_avx:
|
||||
# AVX also requires CR4.osxsave to be 1. These must be set together
|
||||
# of KVM will error out.
|
||||
system.workload.enable_osxsave = 1
|
||||
|
||||
Reference in New Issue
Block a user