derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Giacomo Travaglini	c9d9108978	arch-arm: MISCREG_AT_S1E2R/W are executable from S state (#1322 ) Change-Id: Ieaebdf0d62b5115f8085f478b2da105633b6a26a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-07-04 09:37:17 +01:00
Giacomo Travaglini	f3e3c60805	arch-arm: Proper support for NonSecure IPA space in Secure state Change-Id: Ie2e2278ecdc5213db74999e3561b2918937c2c2e Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-07-02 13:16:13 +01:00
Giacomo Travaglini	eb400e773b	arch-arm: Remove makeStage2 from TLBIOp Change-Id: I25276e4b5b7c491e69208044ceb193c67ddfd91c Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-07-02 13:15:49 +01:00
Giacomo Travaglini	49ca08b01a	arch-arm: Add isStage2 qualifier to the LongDecriptor We are currently using the LongDecriptor for both stage1 and stage2 translations. There are several cases where the bitfield meaning changes depending on the translation stage. Change-Id: Ic33d9ef225a57fd79ce2b4bf47896aeb6bdd8d9c Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-07-02 13:15:31 +01:00
Giacomo Travaglini	9cce68ca71	arch-arm: Replace isSecure boolean with SecurityState enum Change-Id: If01b8b2811b2c028e669ea3700174c7945b07a06 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-07-02 12:45:24 +01:00
Alexander Richardson	d5c0383887	arch-arm: support 64-bit PMCCNTR from AArch32 (#1304 ) For ARMv8 CPUs this register allows reading a 64-bit cycle counter in from 32-bit execution state. Change-Id: I7cd9e2711ada5156920440cc3c89e7a74ca54a49	2024-07-02 08:59:44 +01:00
Giacomo Travaglini	b28659d4f9	arch-arm: Implement FEAT_XS (#1303 ) This patch is adding a functional implementation of FEAT_XS. Unless we operate with DVM enabled, TLBIs broadcasting is accomplished in 0 time; so there is no timing benefit introduced by enabling FEAT_XS other than the way it affects TLB management (invalidation) Change-Id: I067cb8b7702c59c40c9bbb8da536a0b7f3337b5d Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-07-02 08:52:59 +01:00
Matt Sinclair	04a3fd5b5d	gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache (#1260 ) Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b	2024-06-30 13:32:01 -05:00
Bobby R. Bruce	b3f23830c9	misc: Update versioning for develop branch Develop for v24.1 Change-Id: I4ef34c4a4ef67d171505ff9380746ae193655305	2024-06-27 23:36:07 -07:00
Bobby R. Bruce	6fcc13cf55	misc: Merge branch stable into develop This guarantees all changes put on the staging branch and, for whatever reason, put on stable are on develop. This syncs the branches. Change-Id: Ib3513f49977bb4ed3046c2d9d6cf162953b15887	2024-06-27 23:27:21 -07:00
Harshil Patel	3acb6e59cf	resources: Update elfie.py to work with obtain_resources (#1289 ) Change-Id: I08c5e50a150c8434c6c2ca36af81fb6ec3915af8	2024-06-27 20:02:57 -07:00
Jarvis Jia	f56571fed9	Merge branch 'develop' into rubyhitmiss	2024-06-27 21:45:08 +08:00
Rajesh Shashi Kumar	3ce5e0584a	arch-arm: This commit fixes a typo in the ARM ldaddalx instruction (#1279 ) The acquire-release flavor of the ldadd instruction should read ldaddalx (eg. ldaddalb/ldaddalh) according to specification. However, this is currently noted as ldadd"la"x (eg. ldaddlab/ldaddlah). Issue: https://github.com/gem5/gem5/issues/1224 Change-Id: Ib932fa0e572207729c923c27f24c34cc21dff0e5 Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>	2024-06-26 09:03:50 -07:00
Harshil Patel	e0d03fbc2f	resources: fix check for additional_params for workloads Change-Id: I0a4b5f0eef6e2f9faf35cea8130572a066aab6cd	2024-06-26 07:13:04 -07:00
Harshil Patel	144a2071fe	resources: fix check for additional_params for workloads Change-Id: I0a4b5f0eef6e2f9faf35cea8130572a066aab6cd	2024-06-25 16:30:07 -07:00
Harshil Patel	241b8a09df	resources: Update client_query to trim gem5 version (#1284 ) - gem5 was querying the full version of gem5 that is `24.0.0.0` while searching for resources. This was causing an error to find resources on staging branch. This change trims the gem5 version to be just the major.minor version. Change-Id: I30c3a1b38c631981f797ef0fd2b616e6a66ca18e	2024-06-25 09:04:13 -07:00
Harshil Patel	52fde944a5	resources: Update client_query to trim gem5 version (#1284 ) - gem5 was querying the full version of gem5 that is `24.0.0.0` while searching for resources. This was causing an error to find resources on staging branch. This change trims the gem5 version to be just the major.minor version. Change-Id: I30c3a1b38c631981f797ef0fd2b616e6a66ca18e	2024-06-25 09:01:36 -07:00
Jarvis Jia	341c72839b	Fix hit issue Change-Id: I28745489de693591d5ad8453b035a8c782adaf1f	2024-06-24 11:19:51 -07:00
Jarvis Jia	21b69975a6	Fix compilation error Change-Id: I8273472b8d0cff8c02f2d1e1a9d66599af7c4866	2024-06-24 11:19:51 -07:00
Jarvis Jia	e957a882ed	gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b	2024-06-24 11:19:51 -07:00
Mahyar Samani	21bd1c28ab	Adding an example for Spatter (#1272 ) This change adds a new utility function for processing Spatter traces into SpatterKernels under parse_kernels. Additionally, it adds documentation for all the utility functions in spatter_kernel.py. Lastly, it adds an example script for running one spatter trace using SpatterGenerator to the examples.	2024-06-21 02:26:58 -07:00
Mahyar Samani	30bfdc8e52	stdlib: Getter method to get monolith range. (#1273 ) This change extend the AbstractMemory class to add a getter method that allows other components to get the memory's range without interleaving. This method will be useful if other components in the system need to interleave the memory range different to the way the memory has interleaved them.	2024-06-21 02:26:50 -07:00
Mahyar Samani	18bc5227f6	stdlib: Getter method to get monolith range. (#1273 ) This change extend the AbstractMemory class to add a getter method that allows other components to get the memory's range without interleaving. This method will be useful if other components in the system need to interleave the memory range different to the way the memory has interleaved them.	2024-06-21 02:23:58 -07:00
Mahyar Samani	590bb1fbbb	Adding an example for Spatter (#1272 ) This change adds a new utility function for processing Spatter traces into SpatterKernels under parse_kernels. Additionally, it adds documentation for all the utility functions in spatter_kernel.py. Lastly, it adds an example script for running one spatter trace using SpatterGenerator to the examples.	2024-06-21 02:23:41 -07:00
Bobby R. Bruce	d9d7d7646a	misc: Update Doxygen version to v24.0.0.0 Change-Id: Ibaa04b09813a1d497727ed9d2a903ee2b3049ffd	2024-06-20 13:53:20 -07:00
Bobby R. Bruce	888bf0d693	base: Update src/base/version.cc for v24.0 Change-Id: Iac980772a42853f9bfbdadb65d5efc3c5fdb6aed	2024-06-20 13:53:07 -07:00
Jason Lowe-Power	013f773d31	arch-riscv: Fix TLB lookup with vaddrs (#1264 ) Previously, all of the TLB lookup/insert functions were using the full virtual addresses even though the variables in the functions said "vpn." This change explicitly converts the virtual address to the VPN without any least significant zeros for the offset. I.e., vpn >> page_size. The main bug solved in this changeset is the asid was \|'d with the upper bits of the virtual address, but sometimes there were all 1's. Therefore, you could get a TLB hit even if the ASID was different. Interestingly, the page that seemed to cause these issues was a 1 GiB page. This change also starts refactoring some of the page table details to support sv46 and sv57 page table formats. In my testing, the Linux kernel boot uses large pages (even OpenSBI uses large pages), so it seems that large pages also work. However, this seems like magic to me, so I'm not sure if it's correct. This change also updates some asserts, and debug statements with more useful debugging information. Partially fixes #1235. More testing needs to be done to be confident.	2024-06-20 13:24:50 -07:00
Bobby R. Bruce	7137b73ca0	cpu: Fix `std::min` type mismatch in reg_class.hh (#1266 ) Introduced in #1234, this caused compilation to faill in Apple Silicon systems. This bug is the same as #582 where a more detailed explanation is provided.	2024-06-20 13:02:08 -07:00
Mahyar Samani	7ff1e381c9	cpu,stdlib: Fix Access Trace for Accessing Indices in SpatterGen (#1258 ) This change fixes the way indices are generated in a multi generator setup. It changes it from all cores generating the same trace of indices for accessing the index array to each core generating an interleaved subset of indices. For an example look below for traces (indices to index array) in a 2 core setup. Before: core_0: 0, 1, 2, 3, 4, 5, 6, 7, ... core_1: 0, 1, 2, 3, 4, 5, 6, 7, ... After: core_0: 0, 1, 2, 3, 8, 9, 10, 11, ... core_1: 4, 5, 6, 7, 12, 13, 14, 15, ... Additionally, this change fixes the SpatterKernel class in the standard library to comply with the change in the SpatterGen source code.	2024-06-20 11:24:44 -07:00
TiredTumblrina	9fb0b18863	gpu-compute,mem,systemc: This commit corrects typos of 'cache' (#1263 ) I noticed while using the stable branch that there were a few typos of the word 'cache' and so I've corrected a few files where I found such typos. Change-Id: I7c7f64812039f34fe39d0c45c4f5ce921cba06d0	2024-06-20 09:45:13 -07:00
Jason Lowe-Power	943daeb603	stdlib: Add function to append kernel args (#1262 ) Often, you want to add another argument to the default kernel arguments. This function allows you to do that on the `kernel_disk_workload` board mixin.	2024-06-20 09:14:55 -07:00
Bobby R. Bruce	1a00ecfaf9	stdlib,configs,tests: Add gem5 MultiSim (MultiProcessing for gem5) (#1167 ) This allows for multiple gem5 simulations to be spawned from a single parent gem5 process, as defined in a simgle gem5 configuration. In this design _all_ the `Simulator`s are defined in the simulation script and then added to the mutlisim module. For example: ```py from gem5.simulate.Simulator import Simulator import gem5.utils.multisim as multisim # Construct the board[0] and board[1] as you wish here... simulator1 = Simulator(board=board[0], id="board-1") simulator2 = Simulator(board=board[1], id="board-2") multisim.add_simulator(simulator1) multisim.add_simulator(simulator2) ``` This specifies that two simulations are to be run in parallel in seperate threads: one specified by `simulator1` and another by `simulator2`. They are then added to MultiSim via the `multisim.add_simulator` function. The user can specify an id via the Simulator constructor. This is used to give each process a unique id and output directory name. Given this, the id should be a helpful name describing the simulation being specified. If not specified one is automatically given. To run these simulators we use `<gem5 binary> -m gem5.utils.multisim <script> -p <num_processes>`. Note: multisim is an executable module in gem5. This is the same module we input into our scripts to add the simulators. This is an intentionally modular encapsulated design. When the module processes a script it will schedule multiple gem5 jobs and, dependent on the number of processes specified, will create child gem5 processes to processes tjese jobs (jobs are just gem5 simulations in this case). The `--processes` (`-p`) argument is optional and if not specified the max number of processes which can be run concurrently will be the number of available threads on the host system. The id for each process is used to create a subdirectory inside the `outputdor` (`m5out`) of that id name. E.g, in the example above the ID's are `board-1` and `board-2`. Therefore the m5 out directory will look as follows: ```sh - m5out - board-1 - stats.txt - config.ini - config.json - terminal.out - board-2 - stats.txt - config.ini - config.json - terminal.out ``` Each simulations output is encapsulated inside the subdirectory of the id name. If the multisim configuation script is passed directly to gem5 (like a traditional gem5 configuraiton script, i.e.: `<gem5 binary> <script>`), the user may run a single simulation specified in that script by passing its id as an argument. E.g. `<gem5 binary> <script> board-1` will run the `board-1` simulation specified in `script`. If no argument is passed an Exception is raised asking the user to either specify or use the MultiSim module if multiprocessing is needed. If the user desires a list of ids of the simulations specified in a given MultiSim script, they can do so by passing the `--list` (`-l`) parameter to the config script. I.e., `<gem5 binary> <script> --list` will list all the IDs for all the simulations specified in`script`. This change comes with two new example scripts found in 'configs/example/gem5_library/multsim" to demonstrate multisim in both an SE and FS mode simulation. Tests have been added which run these scripts as part of gem5' Daily suite of tests. Notes ===== * Bug fixed: The `NoCache` classic cache hierarchy has been modified so the Xbar is no longet set with a `__func__` call. This interfered with MultiProcessing as this structure is not serializable via Pickle. This was quite bad design anyway so should be changed * Change: `readfile_contents` parameter previously wrote its value to a file called "readfile" in the output dorectory. This has been changed to write to a file called "readfile_{hash}" with "{hash}" being a hash of the `readfile_contents`. This ensures that, during multisim running, this file is not overwritten by other processes. * Removal note: This implementation supercedes the functionality outlined in 'src/python/gem5/utils/multiprocessing'. As such, this code has been removed. Limitations/Things to Fix/Improve ================================= * Though each Simulator process has its own output directory (a subdirectory within m5out, with an ID set by the user unique to that Simulator), the stdout and stderr are still output to the terminal, not the output directory. This results in: 1. stdout and stderr data lost and not recorded for these runs. 2. An incredibly noisy terminal output. * Each process uses the same cached resources. While there are locks on resources when downloading, each processes will hash the resources they require to ensure they are valid. This is very inefficient in cases where resources are common between processes (e.g., you may have 10 processes each using the same disk image with each processes hashing the disk images independently to give the same result to validate the resources). Change-Id: Ief5a3b765070c622d1f0de53ebd545c85a3f0eee --------- Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Co-authored-by: Jason Lowe-Power <jason@lowepower.com>	2024-06-18 09:34:39 -07:00
Bobby R. Bruce	3138c8a8b1	gpu-compute,mem-ruby: Revert "Add RubyHitMiss flag for TCP and TCC cache" (#1254 ) Reverts gem5/gem5#1226	2024-06-18 07:58:54 -07:00
Bobby R. Bruce	36f73f671d	cpu,stdlib: Adding Spatter (#1136 ) This PR adds source code for C++ implementation of SpatterGen as well as SpatterKernel. SpatterGen uses a PyBindMethod to add kernels to the backend code. This way the process of processing json files could be offloaded to python. In addition it adds standard library components for SpatterGenCore and SpatterGen. These two components follow the same structure as AbstractCore and AbstractProcessor. In addition spatter_kernel.py adds a definition for SpatterKernel in python to make adding kernels to C++ easier. Also it adds utility functions for parsing dictionaries read from json as well as partitioning traces for multicore setups.	2024-06-17 15:28:45 -07:00
Hoa Nguyen	15e0236a8b	arch,cpu,sim: Add mechanism to partially print vector regs (#1234 ) Currently, gem5's inst tracer prints the whole vector register container by default. The size of vector register containers in gem5 is the maximum size allowed by the ISA. For vector-length agnostic (VLA) vector registers, this means ARM SVE vector container is 2048 bits long, and RISC-V vector container is 65535 bits long. Note that VLA implementation in gem5 allows the vector length to be varied within the limit specified by the ISAs. However, in most use cases of gem5, the vector length is much less than 65535 bits. This causes two issues: (1) the vector container requires allocating and moving around a large amount of unused data while only a fraction of it is used, and (2) printing the execution trace of a vector register results in a wall of text with a small amount of useful data. This change addresses the problem (2) by providing a mechanism to limit the amount data printed by the instruction tracer. This is done by adding a function printing the first X bits of a vector register container, where X is the vector length determined at runtime, as opposed to the vector container size, which is determined at compilation time. Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7 --------- Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-17 14:05:47 -07:00
hahaxxz	fef6a97f93	mem-ruby: This commit fixes MI_example protocol (#1236 ) fix two bugs in MI_example-dir.sm: 1. Directory cannot handle DMA_READ & DMA_WRITE events in M_DRDI state. 2. Directory cannot handle PUTX_NotOwner events in {M_DWR, M_DRD, M_DRDI, M_DWRI} state. Github Issue: https://github.com/gem5/gem5/issues/1210 Change-Id: I52a9d674ce0688dcfbbcc2b583f17de95afdeb87	2024-06-17 12:45:11 -07:00
Hoa Nguyen	500da4306b	arch: Mark FailUnimplemented instructions as Invalid instructions (#1247 ) This is a follow-up on the discussion here [1]. The IsInvalid flag was previously defined as an instruction that does not appear in the ISA. However, a micro-architecture can choose to not recognize an instruction in and raise illegal instruction fault even if the instruction is in the ISA. This change modifies the definition of a Invalid instruction such that, if a StaticInst instruction is marked as IsInvalid, it means the instruction is not recognized by the decoder. This means that any instruction recognized by the decoder are not invalid, even if the instruction is not in the official ISA spec; e.g., m5 pseudo-instructions. Note that instructions that are recognized by the decoder but are chosen to act as a nop are not invalid. This applies to WarnUnimplemented instructions, e.g. hint instructions. [1] https://github.com/gem5/gem5/pull/1071 Change-Id: I1371b222d8b06793d47f434d0f148c5571672068 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-17 12:44:05 -07:00
Giacomo Travaglini	2804311f7b	cpu-o3: Revert "Do not set Executed on load instruction to be replayed" (#1251 ) Reverts gem5/gem5#1182 This is breaking O3 execution. Investigating the matter	2024-06-17 12:24:43 -07:00
Matt Sinclair	6776bebbf6	gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache (#1226 ) Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1	2024-06-17 12:47:47 -05:00
Jarvis Jia	3a2bf47d57	Add default value and change Ruby address format specifier Change-Id: I8fbaf34745e90589e610d3b9bd423937e7ebdc3d	2024-06-17 03:27:25 -05:00
Jarvis Jia	edb2e76077	Merge branch 'develop' into rubyhitmiss	2024-06-17 15:57:50 +08:00
Matthew Poremba	2b0ca93517	gpu-compute: Fix architected flat scratch Currently writing to SRF which is incorrect, as the physical register number can be clobbered by another wavefront if registers get renamed to the physical register number. Fix this by actually architecting the register, i.e., there is a dedicated "hardware" register in the wavefront class. Change-Id: I94e9e463eed348b2928cae884c1c20566c00984d	2024-06-15 15:46:33 -07:00
Matthew Poremba	2f5842d253	arch-vega: Add valid flag to ds_swizzle_b32 Currently the flag is just Load and there is a long comment explaining why. This does not meet any of the scoreboard check requirements: https://github.com/gem5/gem5/blob/develop/src/gpu-compute/scoreboard_check_stage.cc#L230-L241 Add a generic ALU flag as well so the instruction executes instead of panicking. Change-Id: I54b2d20d47fad5e8f05f927328433aab7db7d862	2024-06-15 14:28:59 -07:00
Matthew Poremba	42369eab2c	arch-vega: Implement MI300 FLAT SVE bit For scratch instructions only, this bit specifies if an offset in a VGPR should be used for address calculation. This is new in MI300 and was previously the LDS bit. The LDS bit is rarely used and in fact gem5 does not even check this bit. This fixes a bug when SADDR == 0x7f (i.e., no SGPR should be used) where a VGPR was being added to the address when it should have been ignored. Change-Id: I9864379692df6795b25b58b98825da05d18fc5db	2024-06-15 14:28:59 -07:00
Matthew Poremba	1dab4be002	arch-vega: Implement VOP3 V_FMAC_F32 A version of V_FMAC_F32 with extra modifiers from VOP3 format. Change-Id: Ib6b41b0a3ceb91269b91a0287dfc94bc73e4d217	2024-06-15 14:28:58 -07:00
Matthew Poremba	f91d14fe46	gpu-compute: Add MFMA stats (#1248 ) Add dynamic instruction counts for MFMAs. Change-Id: I976b01344577cf011aeb3dd648a8c0017281c4e3	2024-06-15 13:04:00 -07:00
Mahyar Samani	d661023de4	stdlib: Adding SpatterGenCore and SpatterGen This change adds code for SpatterGenCore and SpatterGen as well as SpatterKernel to the standard library. SpatterGenCore and SpatterGen follow the same structure as AbstractCore and AbstractProcessor. spatter_kernel.py adds utility functions to parse dictionaries as well as partition a list into multiple lists through interleaving to be used when setting up a multicore SpatterGen. Change-Id: I003553e97f901c0724f5feac0bb6e21a020bd6ad	2024-06-14 13:44:34 -07:00
Mahyar Samani	6695e5ef70	cpu: Adding SpatterGen This change adds source code for SpatterGen ClockedObject. The set of source code pushed includes code for SpatterKernel that tracks whether information is being gathered or scattered as well as the list of indices to be accessed. This model has PyBindMethod to add SpatterKernels from python. This way all the preparations for kernels can be done in python. SpatterGen has a few parameters that model limits on a few of hardware resources in the backend of a processor, e.g. number of functional units to calculate effective address, the latency of calculating effective address, number of integer registers. Change-Id: I451ffb385180a914e884cab220928c5f1944b2e3	2024-06-14 10:45:09 -07:00
Minje Jun	b8e21a2d32	cpu-o3: Do not set Executed on load instruction to be replayed (#1182 ) A load instruction can be replayed when 1) it's strictly ordered or 2) it falls into load-store forwarding mismatch. Case 1 was considered in executeLoad function but the case 2 wasn't. It causes the case-2 replayed load instruction to violate the assertion condition "assert(!load_inst->isExecuted())" in LSQUnit::read. This commit fixes the problem by adding consideration of the case 2 in LSQUnit::executeLoad. Co-authored-by: Minje Jun <minje.jun@samsung.com>	2024-06-14 10:12:26 -07:00
Jason Lowe-Power	21ffd91529	cpu,arch: Add IsInvalid flag to Unknown insts (#1071 ) The IsInvalid flag indicates that the static instruction is not part of the executing ISA and not part of m5's pseudo-instructions. This flag provides a way to recognize an illegal instruction at the decode stage.	2024-06-13 16:26:35 -07:00

1 2 3 4 5 ...

15229 Commits