derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Pranith	50f652a2ee	Implement BTB using the cache library (#1537 ) This enables the BTB to be associative and use various replacement policies.	2024-10-10 17:05:22 +01:00
pre-commit-ci[bot]	54487d3bf6	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2024-10-09 14:04:56 +00:00
Matthew Poremba	4f7b3ed827	mem-ruby: Remove static methods from RubySystem (#1453 ) There are several parts to this PR to work towards #1349 . (1) Make RubySystem::getBlockSizeBytes non-static by providing ways to access the block size or passing the block size explicitly to classes. The main changes are: - DataBlocks must be explicitly allocated. A default ctor still exists to avoid needing to heavily modify SLICC. The size can be set using a realloc function, operator=, or copy ctor. This is handled completely transparently meaning no protocol or config changes are required. - WriteMask now requires block size to be set. This is also handled transparently by modifying the SLICC parser to identify WriteMask types and call setBlockSize(). - AbstractCacheEntry and TBE classes now require block size to be set. This is handled transparently by modifying the SLICC parser to identify these classes and call initBlockSize() which calls setBlockSize() for any DataBlock or WriteMask. - All AbstractControllers now have a pointer to RubySystem. This is assigned in SLICC generated code and requires no changes to protocol or configs. - The Ruby Message class now requires block size in all constructors. This is added to the argument list automatically by the SLICC parser. (2) Relax dependence on common functions in src/mem/ruby/common/Address.hh so that RubySystem::getBlockSizeBits is no longer static. Many classes already have a way to get block size from the previous commit, so they simply multiple by 8 to get the number of bits. For handling SLICC and reducing the number of changes, define makeCacheLine, getOffset, etc. in RubyPort and AbstractController. The only protocol changes required are to change any "RubySystem::foo()" calls with "m_ruby_system->foo()". For classes which do not have a way to get access to block size but still used makeLineAddress, getOffset, etc., the block size must be passed to that class. This requires some changes to the SimObject interface for two commonly used classes: DirectoryMemory and RubyPrefecther, resulting in user-facing API changes User-facing API changes: - DirectoryMemory and RubyPrefetcher now require the cache line size as a non-optional argument. - RubySequencer SimObjects now require RubySystem as a non-optional argument. - TesterThread in the GPU ruby tester now requires the cache line size as a non-optional argument. (3) Removes static member variables in RubySystem which control randomization, cooldown, and warmup. These are mostly used by the Ruby Network. The network classes are modified to take these former static variables as parameters which are passed to the corresponding method (e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object at all. Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220 (4) There are two major SLICC generated static methods: getNumControllers() on each cache controller which returns the number of controllers created by the configs at run time and the functions which access this method, which are MachineType_base_count and MachineType_base_number. These need to be removed to create multiple RubySystem objects otherwise NetDest, version value, and other objects are incorrect. To remove the static requirement, MachineType_base_count and MachineType_base_number are moved to RubySystem. Any class which needs to call these methods must now have a pointer to a RubySystem. To enable that, several changes are made: - RubyRequest and Message now require a RubySystem pointer in the constructor. The pointer is passed to fields in the Message class which require a RubySystem pointer (e.g., NetDest). SLICC is modified to do this automatically. - SLICC structures may now optionally take an "implicit constructor" which can be used to call a non-default constructor for locally defined variables (e.g., temporary variables within SLICC actions). A statement such as "NetDest bcast_dest;" in SLICC will implicitly append a call to the NetDest constructor taking RubySystem, for example. - RubySystem gets passed to Ruby network objects (Network, Topology).	2024-10-08 08:14:50 -07:00
Matthew Poremba	f5858fe81f	dev-amdgpu: Deprecate rom and mmio trace params (#1633 ) The ROM field was originally intended as a future alternate way to load VBIOS without the ROM being on the disk image. This code path is never taken for the devices gem5 supports and there is no gem5 implementation. Deprecate the rom_binary field for this reason. Similarly, MMIO traces were only used for Vega10. Deprecate this as Vega10 is now deprecated. The MMIO trace reader is kept as it may still be useful in the future. It is still the primary way to handle devies which have graphics capability. None of the devices supported by gem5 have graphics now that Vega10 is deprecated.	2024-10-07 07:12:07 -07:00
Matthew Poremba	24504c9a3e	dev-amdgpu: Use GPU specific cache line size (#1621 ) Invalidate requests align to system cache line size. This causes problems if the GPU cache hierarchy's cache line size is different than the system as the unlaigned requests never return, leading to deadlock on deferred dispatch. This commit uses the cache line size from the GPU memory manager and makes the cache line size there non-optional. Tested with multiple RubySystems where CPU side was 64B and GPU side was 128B cache lines.	2024-10-03 08:47:08 -07:00
Matthew Poremba	c8c75959ad	configs: Deprecate Vega10 (#1619 ) Vega10 is no longer officially supported by ROCm and ROCm is starting to use some packet types not supported. These were originally kept to allow users to use older disk images with newer gem5. Going forward the gem5 version and gem5-resources releases will be required to be the same to prevent lingering old configs. As a replacement for vega10*.py, mi300.py or mi200.py should be used. HIP examples, cookbook, and rodinia configs can be replaced with the standard flow of building / obtaining the GPU application and running using mi300.py or mi200.py as they do not require any input options and therefore do not require changes to the disk image.	2024-10-02 14:18:41 -07:00
Erin (Jianghua) Le	c10feed524	tests, configs, util, mem, python, systemc: Change base 10 units to base 2 (#1605 ) This commit changes metric units (e.g. kB, MB, and GB) to binary units (KiB, MiB, GiB) in various files. This PR covers files that were missed by a previous PR that also made these changes.	2024-10-01 11:18:05 -07:00
Bobby R. Bruce	f2f86a3e42	stdlib, python: Add warning message and clarify binary vs metric units (#1479 ) This PR changes memory and cache sizes in various parts of the gem5 codebase to use binary units (e.g. KiB) instead of metric units (e.g. kB). This makes the codebase more consistent, as gem5 automatically converts memory and cache sizes that are in metric units to binary units. This PR also adds a warning message to let users know when an auto-conversion from base 10 to base 2 units occurs. There were a few places in configs and in the comments of various files where I didn't change the metric units, as I couldn't figure out where the parameters with those units were being used.	2024-09-17 17:32:27 +00:00
Daniel Carvalho	51863d322f	gpu-compute: Reuse RP list in GPU_VIPER (#1530 ) It is safer to reuse the dynamic list than manually listing all possible replacement policies. --------- Signed-off-by: odanrc <odanrc@yahoo.com.br>	2024-09-09 09:18:01 -07:00
Giacomo Travaglini	57d82fdbb4	sim-se, arch: Fix syscall parametre sizes for 32-bit OSs (#1482 ) A bug was uncovered in that for various syscalls that used 64bit parametres, the ABI for 32bit operating systems was passing the wrong values to the syscalls, due to discrepancies between the target and guest OS. This commit fixes that by replacing 64-bit types, or types that are platform specific in size, with the exact correspondent for the guest OS, thus producing the correct signature for the respective syscalls. On top of this, the --param argument is added to the starter_se script, in order to support attachment of remote debuggers.	2024-09-03 09:49:59 +01:00
Marco Kurzynski	a8447b7fc0	arch-vega: Pass s_memtime through smem pipe (#1350 ) The Vega ISA's s_memtime instruction is used to obtain a cycle value from the GPU. Previously, this was implemented to obtain the cycle count when the memtime instruction reached the execute stage of the GPU pipeline. However, from microbenchmarking we have found that this under reports the latency for memtime instructions relative to real hardware. Thus, we changed its behavior to go through the scalar memory pipeline and obtain a latency value from the the SQC (L1 I$). This mirrors the suggestion of the AMD Vega ISA manual that s_memtime should be treated like a s_load_dwordx2. The default latency was set based on microbenchmarking. Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420	2024-08-26 19:47:04 -07:00
Erin Le	e1db67c4bd	configs, dev, learning-gem5, python, tests: more clarification This commit contains the rest of the base 2 vs base 10 cache/memory size clarifications. It also changes the warning message to use warn(). With these changes, the warning message should now no longer show up during a fresh compilation of gem5. Change-Id: Ia63f841bdf045b76473437f41548fab27dc19631	2024-08-23 18:02:42 -07:00
Tiberiu Bucur	fe6ef662d1	configs: Add --param to starter_se This commit adds the --param option to the starter_se configuration script for the Arm ISA. This is in order to support attaching remote debugger sessions. Change-Id: I2d8cc9f677f731948872003cca6066d1072ad570 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-08-20 16:18:24 +01:00
Bobby R. Bruce	f600db4a98	gpu-compute,tests: Move GPU tests to testlib (#1270 ) A new host tag `gcn_gpu` has been added. This allows for selection of those GPU tests which depend upon the gcn-gpu docker image to run. In addition to this, the square GPU tests has been moved to the CI tests. This ensures some GPU code is compiled and run on every PR.	2024-08-19 10:58:06 -07:00
Matt Sinclair	03ddd0b75f	gpu-compute: fix GPU TLB outstandingReqs vs. associativity The GPU TLB maxOutstandingReqs field gets limited by the associativity. In the current setup, this means that the max outstanding requests is 32 even though the setup is for 64 entries. Update the associativity to all 64 entries. Change-Id: I2104e4647d97bf4d1cf5ac447e38ad6ac6a1a0d8	2024-08-07 16:16:01 -05:00
Matthew Poremba	ddc9a18536	configs: GPUFS: Disable KVM perf counters by default (#1391 ) This is on by default in gem5 (see src/cpu/kvm/BaseKvmCPU.py), however the perf counters only measure host instruction counters and GPUFS is not concerned about accuracy of KVM CPU stats. There are also a larger set of users who have access to KVM, but do not have the paranoid level low enough to attach performance counters. Therefore, make the performance counters OFF by default. They can still be enabled, but this will allow for a larger set of users to follow the upcoming GPUFS documentation without needing to read through a troubleshooting section after seeing a gem5 error about the KVM paranoid level. Change-Id: I6b465559edf3ce17e7117ada049c60bd39aecd83	2024-07-29 12:26:10 -07:00
Yangyu Chen	2b902b0aec	arch-riscv: add rv32 option to FS Linux config file (#1312 ) Since we have supported RISC-V 32, add this option to allow the RISC-V 32 full system to run easily. Signed-off-by: Yangyu Chen <cyy@cyyself.name>	2024-07-10 11:41:48 -07:00
Mahyar Samani	590bb1fbbb	Adding an example for Spatter (#1272 ) This change adds a new utility function for processing Spatter traces into SpatterKernels under parse_kernels. Additionally, it adds documentation for all the utility functions in spatter_kernel.py. Lastly, it adds an example script for running one spatter trace using SpatterGenerator to the examples.	2024-06-21 02:23:41 -07:00
Matthew Poremba	ed860dfe54	configs: Check before use replacement policy options (#1261 ) Rather than adding the options to every config that might be using GPU_VIPER.py, just change the Ruby config to check if the option is available before trying to use it. Otherwise, reverts to what was the default on stable. Change-Id: Ia6f1d0827d489ee2a35c598b644461cbff59e247	2024-06-20 09:50:29 -07:00
Bobby R. Bruce	1a00ecfaf9	stdlib,configs,tests: Add gem5 MultiSim (MultiProcessing for gem5) (#1167 ) This allows for multiple gem5 simulations to be spawned from a single parent gem5 process, as defined in a simgle gem5 configuration. In this design _all_ the `Simulator`s are defined in the simulation script and then added to the mutlisim module. For example: ```py from gem5.simulate.Simulator import Simulator import gem5.utils.multisim as multisim # Construct the board[0] and board[1] as you wish here... simulator1 = Simulator(board=board[0], id="board-1") simulator2 = Simulator(board=board[1], id="board-2") multisim.add_simulator(simulator1) multisim.add_simulator(simulator2) ``` This specifies that two simulations are to be run in parallel in seperate threads: one specified by `simulator1` and another by `simulator2`. They are then added to MultiSim via the `multisim.add_simulator` function. The user can specify an id via the Simulator constructor. This is used to give each process a unique id and output directory name. Given this, the id should be a helpful name describing the simulation being specified. If not specified one is automatically given. To run these simulators we use `<gem5 binary> -m gem5.utils.multisim <script> -p <num_processes>`. Note: multisim is an executable module in gem5. This is the same module we input into our scripts to add the simulators. This is an intentionally modular encapsulated design. When the module processes a script it will schedule multiple gem5 jobs and, dependent on the number of processes specified, will create child gem5 processes to processes tjese jobs (jobs are just gem5 simulations in this case). The `--processes` (`-p`) argument is optional and if not specified the max number of processes which can be run concurrently will be the number of available threads on the host system. The id for each process is used to create a subdirectory inside the `outputdor` (`m5out`) of that id name. E.g, in the example above the ID's are `board-1` and `board-2`. Therefore the m5 out directory will look as follows: ```sh - m5out - board-1 - stats.txt - config.ini - config.json - terminal.out - board-2 - stats.txt - config.ini - config.json - terminal.out ``` Each simulations output is encapsulated inside the subdirectory of the id name. If the multisim configuation script is passed directly to gem5 (like a traditional gem5 configuraiton script, i.e.: `<gem5 binary> <script>`), the user may run a single simulation specified in that script by passing its id as an argument. E.g. `<gem5 binary> <script> board-1` will run the `board-1` simulation specified in `script`. If no argument is passed an Exception is raised asking the user to either specify or use the MultiSim module if multiprocessing is needed. If the user desires a list of ids of the simulations specified in a given MultiSim script, they can do so by passing the `--list` (`-l`) parameter to the config script. I.e., `<gem5 binary> <script> --list` will list all the IDs for all the simulations specified in`script`. This change comes with two new example scripts found in 'configs/example/gem5_library/multsim" to demonstrate multisim in both an SE and FS mode simulation. Tests have been added which run these scripts as part of gem5' Daily suite of tests. Notes ===== * Bug fixed: The `NoCache` classic cache hierarchy has been modified so the Xbar is no longet set with a `__func__` call. This interfered with MultiProcessing as this structure is not serializable via Pickle. This was quite bad design anyway so should be changed * Change: `readfile_contents` parameter previously wrote its value to a file called "readfile" in the output dorectory. This has been changed to write to a file called "readfile_{hash}" with "{hash}" being a hash of the `readfile_contents`. This ensures that, during multisim running, this file is not overwritten by other processes. * Removal note: This implementation supercedes the functionality outlined in 'src/python/gem5/utils/multiprocessing'. As such, this code has been removed. Limitations/Things to Fix/Improve ================================= * Though each Simulator process has its own output directory (a subdirectory within m5out, with an ID set by the user unique to that Simulator), the stdout and stderr are still output to the terminal, not the output directory. This results in: 1. stdout and stderr data lost and not recorded for these runs. 2. An incredibly noisy terminal output. * Each process uses the same cached resources. While there are locks on resources when downloading, each processes will hash the resources they require to ensure they are valid. This is very inefficient in cases where resources are common between processes (e.g., you may have 10 processes each using the same disk image with each processes hashing the disk images independently to give the same result to validate the resources). Change-Id: Ief5a3b765070c622d1f0de53ebd545c85a3f0eee --------- Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Co-authored-by: Jason Lowe-Power <jason@lowepower.com>	2024-06-18 09:34:39 -07:00
Matthew Poremba	3cf638e217	gpu-compute, util-m5: add GPU kernel exit events (#1217 ) The GPUFS scripts include support for dumping and resetting stats at kernel boundaries by identifying specific GPU kernel exit events. This commit extends that support to work with GPU SE-mode support. Change-Id: I662233ae71e2987d90af1fd0100e29036b2ef1c6	2024-06-14 08:13:27 -07:00
Matthew Poremba	b3d9dc42d4	configs: Add replacement policy options for GPUFS (#1230 ) GPU_VIPER.py was modified to use these options but they did not exist, breaking GPUFS. This commit adds them to fix the issue. Change-Id: I0095f400ea606c4e8d91a41870ef208465cef803	2024-06-13 11:23:50 -07:00
Jarvis Jia	b6b2e8c6c5	Black format Change-Id: If224c106262bae25127675160ea78386eedace3b	2024-06-12 15:57:04 -05:00
Jarvis Jia	0ebcddea95	Update apu_se.py to remove part not needed Change-Id: I06df4e0a67ccd2b7a45296ff65bf26c2b465a934	2024-06-12 15:54:13 -05:00
Jarvis Jia	4fea51b598	Black format change Change-Id: I95cbf5b97601ef3b6ca26bc1a1835305929ffcab	2024-06-10 22:52:56 -05:00
Jarvis Jia	8e268d42e2	gpu-compute: Provided m5ops support for gpu Adding m5 stat dump and reset into python script through different exit event Change-Id: I662233ae71e2987d90af1fd0100e29036b2ef1c6	2024-06-10 20:56:08 -05:00
Jarvis Jia	cf5e316a92	Change black format Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296	2024-06-10 16:33:18 -05:00
Jarvis Jia	ccdfe00998	gpu-compute: Added functions to choose replacement policies for GPU Adding RP_choose functions to change replacement policies among TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement policies for TCC, TCP and SQC caches for GPU Change-Id: If84a13babf1006ad41a557747c45d48ce2ce22a9	2024-06-10 16:22:41 -05:00
Jarvis Jia	c158ce22bf	gpu-compute: Added functions to choose replacement policies for GPU Adding RP_choose function to change replacement policies among TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement policies for TCC, TCP and SQC caches for GPU	2024-06-10 15:11:17 -05:00
Jarvis Jia	7c410797d1	Adding functions to choose replacement policies for GPU Adding RP_choose functions to change replacement policies among TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement policies for TCC, TCP and SQC caches for GPU	2024-06-10 14:09:09 -05:00
Jarvis Jia	5b44eca64e	Adding functions to choose replacement policies for GPU Adding RP_choose functions to change replacement policies among TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement policies for TCC, TCP and SQC caches for GPU	2024-06-10 13:58:24 -05:00
Matthew Poremba	6164835230	configs: GPUFS: MI300X Add a config capable of simulating MI300X ISA (gfx942). This is similar to the mi200.py config and uses the same scripts followed by some tuneable parameters. This config optionally lets the user call the runMI300GPU function with gem5 resources. This allows for something like the following before a VIPER stdlib python is available: ``` import mi300 from gem5.resources.resource import obtain_resource disk = obtain_resource("x86-gpu-fs-img") kernel = obtain_resource("x86-linux-kernel-5.4.0-105-generic") app = obtain_resource("square-gpu-test") mi300.runMI300GPUFS("X86KvmCPU", disk, kernel, app) ``` Tested cold boot config, checkpoint create and restore, and using gem5 resources. Change-Id: I50a13d7a3d207786b779bf7fd47a5645256b1e6a	2024-05-16 09:23:03 -07:00
Matthew Poremba	8be5ce6fc9	dev-amdgpu,configs,gpu-compute: Add gfx942 version This is the version for MI300. For the most part, it is the same as MI200 with the exception of architected flat scratch (not yet implemented in gem5) and therefore a new version enum is required. Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a	2024-05-15 12:08:41 -07:00
Lukas Zenick	b279e40cb7	configs: nvm sweep fix (#1114 ) These changes to sweep and sweep_hybrid for NVM allow them to run. I'm not an expert on this, so I'm not sure if these are technically correct, but they no longer fail when running `build/X86/gem5.opt configs/nvm/sweep.py` and `build/X86/gem5.opt configs/nvm/sweep_hybrid.py` GitHub Issue: #669	2024-05-13 14:51:39 -07:00
Harshil Patel	5c82447653	misc: Add resource versions to examples (#1110 ) - Explicitly defining resource version in obtain resource calls in examples. Change-Id: I74ab5d2f5e9bc73a0145585a0fe75f2ec905472f	2024-05-09 10:16:27 -07:00
Matthew Poremba	6ed446e546	arch-x86: Add XCR0 register and add to X86KvmCPU (#1040 ) The extended control registers were not being updated in the KVM thread context nor updated in the KVM state. This was causing issues when checkpointing since the XCR0 value was reverting to the default value rather than what it was previously before the checkpoint. THis was causing multiple applications to crash due to executing instructions which are now illegal instructions due to XCR0 being incorrect. This commit adds the XCR0 as a misc register similar to the exiting x86 control registers and adds all of the helper functions to access and set the register value. It also adds support for updating the KVM CPU's state with the register value and updating the thread context's misc reg value so that it is checkpointed along with the other misc regs. Note that this does not add support for XSAVE of the AVX state (i.e., the upper 128 bits of YMM registers). It does however fix the immediate problem in issue #958 . Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76	2024-05-06 09:58:07 -07:00
Matthew Poremba	cb47755e15	gpu: Consolidated fixes for v24.0 (#1103 ) Includes fixes for several bugs reported via email, self found, and internal reports. Also includes runs through Valgrind and UBsan. See individual commits for more details.	2024-05-06 07:35:57 -07:00
Matthew Poremba	0d3d456894	gpu-compute: Invalidate Scalar cache when SQC invalidates (#1093 ) The scalar cache is not being invalidated which causes stale data to be left in the scalar cache between GPU kernels. This commit sends invalidates to the scalar cache when the SQC is invalidated. This is a sufficient baseline for simulation. Since the number of invalidates might be larger than the mandatory queue can hold and no flash invalidate mechanism exists in the VIPER protocol, the command line option for the mandatory queue size is removed, which is the same behavior as the SQC. Change-Id: I1723f224711b04caa4c88beccfa8fb73ccf56572	2024-05-06 07:35:38 -07:00
Matthew Poremba	386fb3d1cc	configs: Fix HSA packer processor address The address has one too many zeros and is therefore placed in a memory region usually used for system memory. As a result this causes failure when trying to run a simulation with a huge amount of memory. Change the address to be within the C000'0000h - FFFF'FFFFh X86 I/O hole as was intended. Change-Id: I5d03ac19ea3b2c01a8c431073c12fa1868b3df24	2024-05-03 14:29:30 -07:00
Harshil Patel	1164f9b81e	tests: update resource to use new checkpoint - Updated the id of the simpoint-se-checkpoint resource. Change-Id: Iab0b10da87b9790c24407e0edce7a18c38e0f48a	2024-05-03 10:55:04 -07:00
Alexander Richardson	1bb5d3b99e	arch-riscv: Add support for RISC-V semihosting (#681 ) See https://github.com/riscv-software-src/riscv-semihosting for the current specification. Almost all code is shared with the Arm implementation. Tested by running some binaries built with [picolibc](https://github.com/picolibc/picolibc).	2024-04-27 05:12:32 -07:00
Matthew Poremba	c54039da5b	configs: GPUFS: Turn off SSE4 and fancy XSAVEs (#1041 ) A user reported a bug with the SSE4.1 version of memcmp in libc. When enabled the simulated program crashes with SIGILL. After attempting all fixes recommended by Intel SDM and still not working, turning the bit off instead. Similar, the default XSAVE functionality is not completely implemented for AVX and newer ISA extensions. Therefore, there is not much point to claiming to support the more advanced versions of XSAVE (XSAVEOPT, XSAVEC, XSAVES, and XGETBV with ECX=1). Note that none of these bits are enabled for non-GPU full system simulations (see src/arch/x86/X86ISA.py). This only impacts GPUFS simulations. Change-Id: I8eb7bf0f2a0a29226095e7889fec9c1e8a65f88f	2024-04-20 11:04:59 -07:00
Bobby R. Bruce	3af15a535e	mem-cache, configs, arch-arm: Handle partitioning policies through a PartitionManager (#966 ) This PR is offloading some of the partitioning logic to the partitioning manager, effectively changing the partitioning interface. Rather than always relying on the PartitionFieldExtention data structure to convey partition IDs, we make it implementation defined by introducing the partitioning manager abstraction. We want user to be able to extract the partitionId more flexibly and this requires using a SimObject. Users can extend the PartitioningManager, overriding the readPacketPartitionId, therefore providing their own mean of injecting/extracting partitioning data from a packet	2024-04-08 16:05:17 -07:00
Giacomo Travaglini	82a82c8793	configs: Change cache_partitioning.py to use PartitionManager Change-Id: I891cc4967dc5483313bcb1179d19b37123a37ba0 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-04-05 10:09:46 +01:00
Kaustav Goswami	28b081b348	arch-arm,stdlib: ARM release for_kvm is moved to configs (#986 ) This change sets the `release` of the ARM board at the config file instead of overriding the release on the ArmBoard. This change partially solves issue 932 as the system taking and restoring the checkpoint is consistent across KVM and timing CPUs respectively. Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>	2024-04-03 11:48:24 +01:00
Matthew Poremba	823b5a6eb8	dev-amdgpu: Support multiple CPs and MMIO AddrRanges Currently gem5 assumes that there is only one command processor (CP) which contains the PM4 packet processor. Some GPU devices have multiple CPs which the driver tests individually during POST if they are used or not. Therefore, these additional CPs need to be supported. This commit allows for multiple PM4 packet processors which represent multiple CPs. Each of these processors will have its own independent MMIO address range. To more easily support ranges, the MMIO addresses now use AddrRange to index a PM4 packet processor instead of the hard-coded constexpr MMIO start and size pairs. By default only one PM4 packet processor is created, meaning the functionality of the simulation is unchanged for devices currently supported in gem5. Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c	2024-03-21 10:13:55 -05:00
Matthew Poremba	6bbde8fbb8	dev-amdgpu: Rework handling of unknown registers The top level AMDGPUDevice currently reads/writes all unknown registers to/from a map containing the previously written value. This is intended as a way to handle registers that are not part of the model but the driver requires for functionality. Since this is at the top level, it can mask changes to register values which do not go through the same interface. For example, reading an MMIO, changing via PM4 queue, and reading again returns the stale cached value. This commit removes the usage of the regs map in AMDGPUDevice, implements some important MMIOs that were previously handled by it, and moves the unknown register handling to the NBIO aperture only. To reduce the number of additional MMIOs to implement, the display manager in vega10 is now disabled. Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2	2024-03-21 10:10:01 -05:00
Michael Boyer	acd9d3ff94	gpu-compute: Add support for skipping GPU kernels (#940 ) gpu-compute: Add support for skipping GPU kernels This commit adds two new command-line options: --skip-until-gpu-kernel N Skips (non-blit) GPU kernels until the target kernel is reached. Execution continues normally from there. Blit kernels are not skipped because they are responsible for copying the kernel code and metadata for the non-blit kernels. Note that skipping kernels can impact correctness; this feature is only useful if the kernel of interest has no data-dependent behavior, or its data-dependent behavior is not based on data generated by the skipped kernels. --exit-after-gpu-kernel N Ends the simulation after completing (non-blit) GPU kernel N. This commit also renames two existing command-line options: --debug-at-gpu-kernel -> --debug-at-gpu-task --exit-at-gpu-kernel -> --exit-at-gpu-task These were renamed because they count GPU tasks, which include both kernels launched by the application as well as blit kernels. Change-Id: If250b3fd2db05c1222e369e9e3f779c4422074bc	2024-03-21 07:46:27 -07:00
Michael Boyer	ba2f5615ba	gpu-compute: Support cache line sizes >64B in GPUFS (#939 ) This change fixes two issues: 1) The --cacheline_size option was setting the system cache line size but not the Ruby cache line size, and the mismatch was causing assertion failures. 2) The submitDispatchPkt() function accesses the kernel object in chunks, with the chunk size equal to the cache line size. For cache line sizes >64B (e.g. 128B), the kernel object is not guaranteed to be aligned to a cache line and it was possible for a chunk to be partially contained in two separate device memories, causing the memory access to fail. Change-Id: I8e45146901943e9c2750d32162c0f35c851e09e1 Co-authored-by: Michael Boyer <Michael.Boyer@amd.com>	2024-03-20 11:09:25 -07:00
Giacomo Travaglini	058dd7e195	configs, tests: Amend stdlib configs to use WalkCache hierarchy As X86 and RISCV are relying on a Table Walker cache, we change their stdlib configs to use the newly defined PrivateL1PrivateL2WalkCacheHierarchy Change-Id: I63c3f70a9daa3b2c7a8306e51af8065bf1bea92b Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-18 09:42:05 +00:00

1 2 3 4 5 ...

1446 Commits