derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Bobby R. Bruce	6e4c1c5db7	scons: Remove -Werror for the gem5 v24.0 release Removing -Werror flag on the stable branch ensures that as new compilers are releases (likely withs stricter warnings) gem5 remains compilable. Change-Id: I0267c895414b630c1d7cd9b28236249790b3006f	2024-06-20 14:47:09 -07:00
Bobby R. Bruce	ec120e0c58	util-docker: Update devontainer Dockerfile for v24.0 Change-Id: Id21fb1b12d8ad58338233d4f32be5b57e025f18b	2024-06-20 14:31:12 -07:00
Bobby R. Bruce	d9d7d7646a	misc: Update Doxygen version to v24.0.0.0 Change-Id: Ibaa04b09813a1d497727ed9d2a903ee2b3049ffd	2024-06-20 13:53:20 -07:00
Bobby R. Bruce	888bf0d693	base: Update src/base/version.cc for v24.0 Change-Id: Iac980772a42853f9bfbdadb65d5efc3c5fdb6aed	2024-06-20 13:53:07 -07:00
Jason Lowe-Power	013f773d31	arch-riscv: Fix TLB lookup with vaddrs (#1264 ) Previously, all of the TLB lookup/insert functions were using the full virtual addresses even though the variables in the functions said "vpn." This change explicitly converts the virtual address to the VPN without any least significant zeros for the offset. I.e., vpn >> page_size. The main bug solved in this changeset is the asid was \|'d with the upper bits of the virtual address, but sometimes there were all 1's. Therefore, you could get a TLB hit even if the ASID was different. Interestingly, the page that seemed to cause these issues was a 1 GiB page. This change also starts refactoring some of the page table details to support sv46 and sv57 page table formats. In my testing, the Linux kernel boot uses large pages (even OpenSBI uses large pages), so it seems that large pages also work. However, this seems like magic to me, so I'm not sure if it's correct. This change also updates some asserts, and debug statements with more useful debugging information. Partially fixes #1235. More testing needs to be done to be confident.	2024-06-20 13:24:50 -07:00
Bobby R. Bruce	7137b73ca0	cpu: Fix `std::min` type mismatch in reg_class.hh (#1266 ) Introduced in #1234, this caused compilation to faill in Apple Silicon systems. This bug is the same as #582 where a more detailed explanation is provided.	2024-06-20 13:02:08 -07:00
Mahyar Samani	7ff1e381c9	cpu,stdlib: Fix Access Trace for Accessing Indices in SpatterGen (#1258 ) This change fixes the way indices are generated in a multi generator setup. It changes it from all cores generating the same trace of indices for accessing the index array to each core generating an interleaved subset of indices. For an example look below for traces (indices to index array) in a 2 core setup. Before: core_0: 0, 1, 2, 3, 4, 5, 6, 7, ... core_1: 0, 1, 2, 3, 4, 5, 6, 7, ... After: core_0: 0, 1, 2, 3, 8, 9, 10, 11, ... core_1: 4, 5, 6, 7, 12, 13, 14, 15, ... Additionally, this change fixes the SpatterKernel class in the standard library to comply with the change in the SpatterGen source code.	2024-06-20 11:24:44 -07:00
Matthew Poremba	ed860dfe54	configs: Check before use replacement policy options (#1261 ) Rather than adding the options to every config that might be using GPU_VIPER.py, just change the Ruby config to check if the option is available before trying to use it. Otherwise, reverts to what was the default on stable. Change-Id: Ia6f1d0827d489ee2a35c598b644461cbff59e247	2024-06-20 09:50:29 -07:00
TiredTumblrina	9fb0b18863	gpu-compute,mem,systemc: This commit corrects typos of 'cache' (#1263 ) I noticed while using the stable branch that there were a few typos of the word 'cache' and so I've corrected a few files where I found such typos. Change-Id: I7c7f64812039f34fe39d0c45c4f5ce921cba06d0	2024-06-20 09:45:13 -07:00
Jason Lowe-Power	943daeb603	stdlib: Add function to append kernel args (#1262 ) Often, you want to add another argument to the default kernel arguments. This function allows you to do that on the `kernel_disk_workload` board mixin.	2024-06-20 09:14:55 -07:00
Bobby R. Bruce	25d614e4ce	tests: Fix x86_boot_exit_run.py 'set_max_ticks' typo (#1267 )	2024-06-20 00:31:23 -07:00
Ivana Mitrovic	e88f0944e3	util: Bump urllib3 in gem5-resource-manager (#1257 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.7 to 2.2.2. Change-Id: I218236ff9ebe99839e417b67e740e6f98c0ee473	2024-06-18 11:05:13 -07:00
Bobby R. Bruce	9fe2bc9edc	util-docker: Update devcontainer to use Ubuntu 24.04 (#1256 ) Change-Id: I0e0dbaca2194c7f0ff5de54a49888da1c938c2de	2024-06-18 09:35:18 -07:00
Bobby R. Bruce	1a00ecfaf9	stdlib,configs,tests: Add gem5 MultiSim (MultiProcessing for gem5) (#1167 ) This allows for multiple gem5 simulations to be spawned from a single parent gem5 process, as defined in a simgle gem5 configuration. In this design _all_ the `Simulator`s are defined in the simulation script and then added to the mutlisim module. For example: ```py from gem5.simulate.Simulator import Simulator import gem5.utils.multisim as multisim # Construct the board[0] and board[1] as you wish here... simulator1 = Simulator(board=board[0], id="board-1") simulator2 = Simulator(board=board[1], id="board-2") multisim.add_simulator(simulator1) multisim.add_simulator(simulator2) ``` This specifies that two simulations are to be run in parallel in seperate threads: one specified by `simulator1` and another by `simulator2`. They are then added to MultiSim via the `multisim.add_simulator` function. The user can specify an id via the Simulator constructor. This is used to give each process a unique id and output directory name. Given this, the id should be a helpful name describing the simulation being specified. If not specified one is automatically given. To run these simulators we use `<gem5 binary> -m gem5.utils.multisim <script> -p <num_processes>`. Note: multisim is an executable module in gem5. This is the same module we input into our scripts to add the simulators. This is an intentionally modular encapsulated design. When the module processes a script it will schedule multiple gem5 jobs and, dependent on the number of processes specified, will create child gem5 processes to processes tjese jobs (jobs are just gem5 simulations in this case). The `--processes` (`-p`) argument is optional and if not specified the max number of processes which can be run concurrently will be the number of available threads on the host system. The id for each process is used to create a subdirectory inside the `outputdor` (`m5out`) of that id name. E.g, in the example above the ID's are `board-1` and `board-2`. Therefore the m5 out directory will look as follows: ```sh - m5out - board-1 - stats.txt - config.ini - config.json - terminal.out - board-2 - stats.txt - config.ini - config.json - terminal.out ``` Each simulations output is encapsulated inside the subdirectory of the id name. If the multisim configuation script is passed directly to gem5 (like a traditional gem5 configuraiton script, i.e.: `<gem5 binary> <script>`), the user may run a single simulation specified in that script by passing its id as an argument. E.g. `<gem5 binary> <script> board-1` will run the `board-1` simulation specified in `script`. If no argument is passed an Exception is raised asking the user to either specify or use the MultiSim module if multiprocessing is needed. If the user desires a list of ids of the simulations specified in a given MultiSim script, they can do so by passing the `--list` (`-l`) parameter to the config script. I.e., `<gem5 binary> <script> --list` will list all the IDs for all the simulations specified in`script`. This change comes with two new example scripts found in 'configs/example/gem5_library/multsim" to demonstrate multisim in both an SE and FS mode simulation. Tests have been added which run these scripts as part of gem5' Daily suite of tests. Notes ===== * Bug fixed: The `NoCache` classic cache hierarchy has been modified so the Xbar is no longet set with a `__func__` call. This interfered with MultiProcessing as this structure is not serializable via Pickle. This was quite bad design anyway so should be changed * Change: `readfile_contents` parameter previously wrote its value to a file called "readfile" in the output dorectory. This has been changed to write to a file called "readfile_{hash}" with "{hash}" being a hash of the `readfile_contents`. This ensures that, during multisim running, this file is not overwritten by other processes. * Removal note: This implementation supercedes the functionality outlined in 'src/python/gem5/utils/multiprocessing'. As such, this code has been removed. Limitations/Things to Fix/Improve ================================= * Though each Simulator process has its own output directory (a subdirectory within m5out, with an ID set by the user unique to that Simulator), the stdout and stderr are still output to the terminal, not the output directory. This results in: 1. stdout and stderr data lost and not recorded for these runs. 2. An incredibly noisy terminal output. * Each process uses the same cached resources. While there are locks on resources when downloading, each processes will hash the resources they require to ensure they are valid. This is very inefficient in cases where resources are common between processes (e.g., you may have 10 processes each using the same disk image with each processes hashing the disk images independently to give the same result to validate the resources). Change-Id: Ief5a3b765070c622d1f0de53ebd545c85a3f0eee --------- Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Co-authored-by: Jason Lowe-Power <jason@lowepower.com>	2024-06-18 09:34:39 -07:00
Bobby R. Bruce	3138c8a8b1	gpu-compute,mem-ruby: Revert "Add RubyHitMiss flag for TCP and TCC cache" (#1254 ) Reverts gem5/gem5#1226	2024-06-18 07:58:54 -07:00
Bobby R. Bruce	36f73f671d	cpu,stdlib: Adding Spatter (#1136 ) This PR adds source code for C++ implementation of SpatterGen as well as SpatterKernel. SpatterGen uses a PyBindMethod to add kernels to the backend code. This way the process of processing json files could be offloaded to python. In addition it adds standard library components for SpatterGenCore and SpatterGen. These two components follow the same structure as AbstractCore and AbstractProcessor. In addition spatter_kernel.py adds a definition for SpatterKernel in python to make adding kernels to C++ easier. Also it adds utility functions for parsing dictionaries read from json as well as partitioning traces for multicore setups.	2024-06-17 15:28:45 -07:00
Hoa Nguyen	15e0236a8b	arch,cpu,sim: Add mechanism to partially print vector regs (#1234 ) Currently, gem5's inst tracer prints the whole vector register container by default. The size of vector register containers in gem5 is the maximum size allowed by the ISA. For vector-length agnostic (VLA) vector registers, this means ARM SVE vector container is 2048 bits long, and RISC-V vector container is 65535 bits long. Note that VLA implementation in gem5 allows the vector length to be varied within the limit specified by the ISAs. However, in most use cases of gem5, the vector length is much less than 65535 bits. This causes two issues: (1) the vector container requires allocating and moving around a large amount of unused data while only a fraction of it is used, and (2) printing the execution trace of a vector register results in a wall of text with a small amount of useful data. This change addresses the problem (2) by providing a mechanism to limit the amount data printed by the instruction tracer. This is done by adding a function printing the first X bits of a vector register container, where X is the vector length determined at runtime, as opposed to the vector container size, which is determined at compilation time. Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7 --------- Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-17 14:05:47 -07:00
hahaxxz	fef6a97f93	mem-ruby: This commit fixes MI_example protocol (#1236 ) fix two bugs in MI_example-dir.sm: 1. Directory cannot handle DMA_READ & DMA_WRITE events in M_DRDI state. 2. Directory cannot handle PUTX_NotOwner events in {M_DWR, M_DRD, M_DRDI, M_DWRI} state. Github Issue: https://github.com/gem5/gem5/issues/1210 Change-Id: I52a9d674ce0688dcfbbcc2b583f17de95afdeb87	2024-06-17 12:45:11 -07:00
Hoa Nguyen	500da4306b	arch: Mark FailUnimplemented instructions as Invalid instructions (#1247 ) This is a follow-up on the discussion here [1]. The IsInvalid flag was previously defined as an instruction that does not appear in the ISA. However, a micro-architecture can choose to not recognize an instruction in and raise illegal instruction fault even if the instruction is in the ISA. This change modifies the definition of a Invalid instruction such that, if a StaticInst instruction is marked as IsInvalid, it means the instruction is not recognized by the decoder. This means that any instruction recognized by the decoder are not invalid, even if the instruction is not in the official ISA spec; e.g., m5 pseudo-instructions. Note that instructions that are recognized by the decoder but are chosen to act as a nop are not invalid. This applies to WarnUnimplemented instructions, e.g. hint instructions. [1] https://github.com/gem5/gem5/pull/1071 Change-Id: I1371b222d8b06793d47f434d0f148c5571672068 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-17 12:44:05 -07:00
Giacomo Travaglini	2804311f7b	cpu-o3: Revert "Do not set Executed on load instruction to be replayed" (#1251 ) Reverts gem5/gem5#1182 This is breaking O3 execution. Investigating the matter	2024-06-17 12:24:43 -07:00
Matt Sinclair	6776bebbf6	gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache (#1226 ) Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1	2024-06-17 12:47:47 -05:00
Matthew Poremba	50e4209a4a	arch-vega: Various MI300 fixes for PyTorch tests (#1249 ) - Fix address calculation issue with scratch_* instructions when SVE bit is 0. - Fix ds_swizzle_b32 not mapping to execution unit. - Implement VOP3 V_FMAC_B32. - Fix architected scratch address register being clobbered. Tested with MNIST from PyTorch quickstart tutorial and nanoGPT on mi300.py.	2024-06-17 07:59:47 -07:00
Jarvis Jia	3a2bf47d57	Add default value and change Ruby address format specifier Change-Id: I8fbaf34745e90589e610d3b9bd423937e7ebdc3d	2024-06-17 03:27:25 -05:00
Jarvis Jia	edb2e76077	Merge branch 'develop' into rubyhitmiss	2024-06-17 15:57:50 +08:00
Matthew Poremba	2b0ca93517	gpu-compute: Fix architected flat scratch Currently writing to SRF which is incorrect, as the physical register number can be clobbered by another wavefront if registers get renamed to the physical register number. Fix this by actually architecting the register, i.e., there is a dedicated "hardware" register in the wavefront class. Change-Id: I94e9e463eed348b2928cae884c1c20566c00984d	2024-06-15 15:46:33 -07:00
Matthew Poremba	2f5842d253	arch-vega: Add valid flag to ds_swizzle_b32 Currently the flag is just Load and there is a long comment explaining why. This does not meet any of the scoreboard check requirements: https://github.com/gem5/gem5/blob/develop/src/gpu-compute/scoreboard_check_stage.cc#L230-L241 Add a generic ALU flag as well so the instruction executes instead of panicking. Change-Id: I54b2d20d47fad5e8f05f927328433aab7db7d862	2024-06-15 14:28:59 -07:00
Matthew Poremba	42369eab2c	arch-vega: Implement MI300 FLAT SVE bit For scratch instructions only, this bit specifies if an offset in a VGPR should be used for address calculation. This is new in MI300 and was previously the LDS bit. The LDS bit is rarely used and in fact gem5 does not even check this bit. This fixes a bug when SADDR == 0x7f (i.e., no SGPR should be used) where a VGPR was being added to the address when it should have been ignored. Change-Id: I9864379692df6795b25b58b98825da05d18fc5db	2024-06-15 14:28:59 -07:00
Matthew Poremba	1dab4be002	arch-vega: Implement VOP3 V_FMAC_F32 A version of V_FMAC_F32 with extra modifiers from VOP3 format. Change-Id: Ib6b41b0a3ceb91269b91a0287dfc94bc73e4d217	2024-06-15 14:28:58 -07:00
Matthew Poremba	f91d14fe46	gpu-compute: Add MFMA stats (#1248 ) Add dynamic instruction counts for MFMAs. Change-Id: I976b01344577cf011aeb3dd648a8c0017281c4e3	2024-06-15 13:04:00 -07:00
Mahyar Samani	d661023de4	stdlib: Adding SpatterGenCore and SpatterGen This change adds code for SpatterGenCore and SpatterGen as well as SpatterKernel to the standard library. SpatterGenCore and SpatterGen follow the same structure as AbstractCore and AbstractProcessor. spatter_kernel.py adds utility functions to parse dictionaries as well as partition a list into multiple lists through interleaving to be used when setting up a multicore SpatterGen. Change-Id: I003553e97f901c0724f5feac0bb6e21a020bd6ad	2024-06-14 13:44:34 -07:00
Mahyar Samani	6695e5ef70	cpu: Adding SpatterGen This change adds source code for SpatterGen ClockedObject. The set of source code pushed includes code for SpatterKernel that tracks whether information is being gathered or scattered as well as the list of indices to be accessed. This model has PyBindMethod to add SpatterKernels from python. This way all the preparations for kernels can be done in python. SpatterGen has a few parameters that model limits on a few of hardware resources in the backend of a processor, e.g. number of functional units to calculate effective address, the latency of calculating effective address, number of integer registers. Change-Id: I451ffb385180a914e884cab220928c5f1944b2e3	2024-06-14 10:45:09 -07:00
Minje Jun	b8e21a2d32	cpu-o3: Do not set Executed on load instruction to be replayed (#1182 ) A load instruction can be replayed when 1) it's strictly ordered or 2) it falls into load-store forwarding mismatch. Case 1 was considered in executeLoad function but the case 2 wasn't. It causes the case-2 replayed load instruction to violate the assertion condition "assert(!load_inst->isExecuted())" in LSQUnit::read. This commit fixes the problem by adding consideration of the case 2 in LSQUnit::executeLoad. Co-authored-by: Minje Jun <minje.jun@samsung.com>	2024-06-14 10:12:26 -07:00
Matthew Poremba	3cf638e217	gpu-compute, util-m5: add GPU kernel exit events (#1217 ) The GPUFS scripts include support for dumping and resetting stats at kernel boundaries by identifying specific GPU kernel exit events. This commit extends that support to work with GPU SE-mode support. Change-Id: I662233ae71e2987d90af1fd0100e29036b2ef1c6	2024-06-14 08:13:27 -07:00
Jason Lowe-Power	21ffd91529	cpu,arch: Add IsInvalid flag to Unknown insts (#1071 ) The IsInvalid flag indicates that the static instruction is not part of the executing ISA and not part of m5's pseudo-instructions. This flag provides a way to recognize an illegal instruction at the decode stage.	2024-06-13 16:26:35 -07:00
Matthew Poremba	b3d9dc42d4	configs: Add replacement policy options for GPUFS (#1230 ) GPU_VIPER.py was modified to use these options but they did not exist, breaking GPUFS. This commit adds them to fix the issue. Change-Id: I0095f400ea606c4e8d91a41870ef208465cef803	2024-06-13 11:23:50 -07:00
Jarvis Jia	87c0d7732c	Merge branch 'develop' into rubyhitmiss	2024-06-12 17:30:35 -04:00
Jarvis Jia	edfc139c40	Change black format Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296 gpu-compute: add GPU RubyHitMiss for TCP and TCC Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1 gpu-compute: Add RubyHitMiss flag for TCP and TCC cache Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5 gpu-compute: Add RubyHitMiss flag for TCP and TCC cache Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5 Remove space Change-Id: I401f528c6f128ba0956bdbc232e8f2ae37bf648c	2024-06-12 16:04:36 -05:00
Jarvis Jia	b6b2e8c6c5	Black format Change-Id: If224c106262bae25127675160ea78386eedace3b	2024-06-12 15:57:04 -05:00
Jarvis Jia	0ebcddea95	Update apu_se.py to remove part not needed Change-Id: I06df4e0a67ccd2b7a45296ff65bf26c2b465a934	2024-06-12 15:54:13 -05:00
Matthew Poremba	be0a7937c1	mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests (#1216 ) When a compute unit issues several requests to the same line, the requests wait in the L2 if it is a writeback cache. If the line is invalid initially and the first request is atomic in nature, the L2 cache issues a request to main memory. On data return, the cache line transitions to M but doesn't wake up the other requests, resulting in a deadlock. This commit adds a wakeup call on data return for atomics and fixes potential deadlocks.	2024-06-12 10:10:32 -07:00
Harshil Patel	74afea471d	cpu: Revert "Don't change to suspend if the thread status is halted" (#1225 ) Reverts gem5/gem5#1039	2024-06-12 00:20:06 -07:00
Bobby R. Bruce	f9abf6bb08	stdlib: Improve gem5 PyStats (#996 ) This PR incorporates numerous improvements and fixes to the gem5 PyStats. This includes: * PyStats now support SimObject Vectors. The PyStats representing them are subscribable and therefore acceptable by accessing an index: e.g.,: `simobjectvec[0]`. (This replaces the `Vector` group PyStat) * Adds the `SparseHist` PyStats. * Adds the `Vector2d` to PyStats. * The `Distribution` PyStats is fixed to be a vector of Scalars. * Tests added for the PyStat's Vector and bugs fixed.	2024-06-12 00:19:08 -07:00
Bobby R. Bruce	261490f23c	misc,tests: Revert merge version to 'v4' from 'v4.0.0' 'v4.0.0' wasn't working. The following error was occurred: ``` Can't find 'action.yml', 'action.yaml' or 'Dockerfile' for action 'actions/upload-artifact/merge@v4.0.0'. ``` Change-Id: I658b0fe292df029501fbc1286acb06f4014ae4e1	2024-06-12 00:14:27 -07:00
Vishnu Ramadas	42b9a9666e	mem-ruby: Add instSeqNum to atomic responses from GPU L2 caches This commit adds instSeqNum to the atomic responses in GPU_VIPER-TCC.sm. This will be useful when debugging issues related to GPU atomic transactions Change-Id: Ic05c8e1a1cb230abfca2759b51e5603304aadaa3	2024-06-11 20:35:43 -05:00
Vishnu Ramadas	943d1f1453	mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests When a compute unit issues several requests to the same line, the requests wait in the L2 if it is a writeback cache. If the line is invalid initially and the first request is atomic in nature, the L2 cache issues a request to main memory. On data return, the cache line transitions to M but doesn't wake up the other requests, resulting in a deadlock. This commit adds a wakeup call on data return for atomics and fixes potential deadlocks. Change-Id: I8200ce6e77da7c8b4db285c0cc8b8ca0dfa7d720	2024-06-11 20:33:46 -05:00
Bobby R. Bruce	7e45ec0ff0	stdlib: Fix m5.ext.pystats __init__.py Addresses Jason's complaint that wildcare imports should be avoided, in accordance with PEP008: https://github.com/gem5/gem5/pull/996#discussion_r1621051601. Change-Id: I72266df43d3ec4ede3f45c3e34e2e05e1990bd6b	2024-06-11 16:26:24 -07:00
Bobby R. Bruce	8fc4d3f793	misc,tests: Update daily test artifact actions to v4.0.0 Change-Id: I711fa36639e925ce958e0484a31ee6a4dde87dbe	2024-06-11 15:43:40 -07:00
Matt Sinclair	8a44e97a10	gpu-compute: Added functions to choose replacement policies for GPU (#1213 ) Adding RP_choose functions to change replacement policies among TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement policies for TCC, TCP and SQC caches for GPU	2024-06-11 15:08:42 -05:00
Hoa Nguyen	d528a6bd2d	arch: Flag all ISAs Unknown instruction as IsInvalid Change-Id: I096138a157c4e2063c5f4f4324c21c1463dddb65 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-11 18:48:29 +00:00
Hoa Nguyen	369029d2be	cpu: Add IsInvalid flag to StaticInstFlags The IsInvalid flag indicates that the static instruction is not part of the executing ISA and not part of m5's pseudo-instructions. This flag provides a way to recognize an illegal instruction at the decode stage. Change-Id: I2779c6edcd8c5e6a77ea11cad3ff73bacb79d800 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-11 18:48:29 +00:00

1 2 3 4 5 ...

21742 Commits