gpu-compute: Support cache line sizes >64B in GPUFS (#939)

This change fixes two issues: 1) The --cacheline_size option was setting the system cache line size but not the Ruby cache line size, and the mismatch was causing assertion failures. 2) The submitDispatchPkt() function accesses the kernel object in chunks, with the chunk size equal to the cache line size. For cache line sizes >64B (e.g. 128B), the kernel object is not guaranteed to be aligned to a cache line and it was possible for a chunk to be partially contained in two separate device memories, causing the memory access to fail. Change-Id: I8e45146901943e9c2750d32162c0f35c851e09e1 Co-authored-by: Michael Boyer <Michael.Boyer@amd.com>
2024-03-20 11:09:25 -07:00
parent 2b67d0eba6
commit ba2f5615ba
2 changed files with 11 additions and 3 deletions
--- a/configs/example/gpufs/Disjoint_VIPER.py
+++ b/configs/example/gpufs/Disjoint_VIPER.py
@@ -58,6 +58,8 @@ class Disjoint_VIPER(RubySystem):
            self.network_cpu = DisjointSimple(self)
            self.network_gpu = DisjointSimple(self)

+        self.block_size_bytes = options.cacheline_size
+
        # Construct CPU controllers
        cpu_dir_nodes = construct_dirs(options, system, self, self.network_cpu)
        (cp_sequencers, cp_cntrl_nodes) = construct_corepairs(