derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Bobby R. Bruce	d11c40dcac	misc: Run `pre-commit run --all-files` This ensures `isort` is applied to all files in the repo. Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9	2023-11-29 22:06:41 -08:00
Giacomo Travaglini	a0a799f474	cpu: Disable CPU switching functionality with TraceCPU Now that the TraceCPU is no longer a BaseCPU we disable CPU switching functionality. AFAICS from the code, it seems like using m5.switchCpus was never really working. The takeOverFrom was described as being used when checkpointing (which is not really the case). Moreover the icache/dcache event loops were not checking if the CPU was switched out so the trace was always been consumed regardless of the BaseCPU state. Note: IMHO the only case where you might want to switch between an execution-driven CPU to the TraceCPU is when you want to warm your caches before the ROI. All other cases don't really make sense as with the TraceCPU there is no architectural state being maintained/updated. Change-Id: I0611359d2b833e1bc0762be72642df24a7c92b1e Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-09-12 15:50:05 +01:00
Giacomo Travaglini	9a5d900770	cpu: Stop treating TraceCPU as a BaseCPU This is fixing a recently reported issue [1] where it is not possible to use the TraceCPU to replay elastic traces It requires some architectural data structures (like ArchMMU, ArchDecoder...) which are no longer defined in the BaseCPU class at compilation time. Which Arch version should be used for a class (TraceCPU) that is supposed to be ISA agnostic ? Does it really make sense to define them for the TraceCPU? Those classes are not used anyway during trace replay and their sole purpose would just be to comply to the BaseCPU interface. As there is no elegant way to make things work, this patch stops treating the TraceCPU as a BaseCPU. While it philosophically makes sense to treat the TraceCPU as a common CPU (it sort of replays pre-executed instructions), a case can be made for considering it more like a traffic generator. [1]: https://github.com/gem5/gem5/issues/301 Change-Id: I7438169e8cc7fb6272731efb336ed2cf271c0844 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-09-12 15:49:29 +01:00
Hoa Nguyen	eac06ad681	python: Fix multiline quotes in a single line An example case, ```python mem_side_port = RequestPort( "This port sends requests and " "receives responses" ) ``` This is the residue of running the python formatter. This is done by finding all tokens matching the regex `"\s"(?![.;"])` and manually replacing them by empty strings. Change-Id: Icf223bbe889e5fa5749a81ef77aa6e721f38b549 Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66111 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2022-11-29 23:44:38 +00:00
Bobby R. Bruce	2bc5a8b71a	misc: Run pre-commit run on all files in repo The following command was run: ``` pre-commit run --all-files ``` This ensures all the files in the repository are formatted to pass our checks. Change-Id: Ia2fe3529a50ad925d1076a612d60a4280adc40de Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62572 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>	2022-08-24 21:47:07 +00:00
Bobby R. Bruce	787204c92d	python: Apply Black formatter to Python files The command executed was `black src configs tests util`. Change-Id: I8dfaa6ab04658fea37618127d6ac19270028d771 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47024 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-03 09:10:41 +00:00
Daniel R. Carvalho	974a47dfb9	misc: Adopt the gem5 namespace Apply the gem5 namespace to the codebase. Some anonymous namespaces could theoretically be removed, but since this change's main goal was to keep conflicts at a minimum, it was decided not to modify much the general shape of the files. A few missing comments of the form "// namespace X" that occurred before the newly added "} // namespace gem5" have been added for consistency. std out should not be included in the gem5 namespace, so they weren't. ProtoMessage has not been included in the gem5 namespace, since I'm not familiar with how proto works. Regarding the SystemC files, although they belong to gem5, they actually perform integration between gem5 and SystemC; therefore, it deserved its own separate namespace. Files that are automatically generated have been included in the gem5 namespace. The .isa files currently are limited to a single namespace. This limitation should be later removed to make it easier to accomodate a better API. Regarding the files in util, gem5:: was prepended where suitable. Notice that this patch was tested as much as possible given that most of these were already not previously compiling. Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323 Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-01 19:08:24 +00:00
Gabe Black	6687265fe2	cpu: Delete authors lists from the cpu directory. Change-Id: Icfba8e23b5f6820a6ddefe1a50abbe5f8825b7b5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25444 Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>	2020-02-17 21:51:23 +00:00
Andreas Sandberg	ef71a987c1	python: Don't assume SimObjects live in the global namespace The importer in Python 3 doesn't like the way we import SimObjects from the global namespace. Convert the existing SimObject declarations to import from m5.objects. As a side-effect, this makes these files consistent with configuration files. Change-Id: I11153502b430822130722839e1fa767b82a027aa Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15981 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2019-02-12 09:43:00 +00:00
Radhika Jagtap	48333e7e3f	cpu: Print progress messages in Trace CPU This change adds the ability to print a message at intervals of committed instruction count to indicate progress in the trace replay. Change-Id: I8363502354c42bfc52936d2627986598b63a5797 Reviewed-by: Rekai Gonzalez Alberquilla <rekai.gonzalezalberquilla@arm.com> Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2321 Reviewed-by: Jason Lowe-Power <jason@lowepower.com>	2017-03-16 13:52:40 +00:00
Radhika Jagtap	1fe5f63137	cpu: Support exit when any one Trace CPU completes replay This change adds a Trace CPU param to exit simulation early, i.e. when the first (any one) trace execution is complete. With this change the user gets a choice to configure exit as either when the last CPU finishes (default) or first CPU finishes replay. Configuring an early exit enables simulating and measuring stats strictly when memory-system resources are being stressed by all Trace CPUs. Change-Id: I3998045fdcc5cd343e1ca92d18dd7f7ecdba8f1d Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>	2016-09-15 18:01:20 +01:00
Radhika Jagtap	d7724d5f54	cpu: Add frequency scaling to the Trace CPU This change adds a simple feature to scale the frequency of the Trace CPU. The compute delays in the input traces provide timing. This change adds a freqency multiplier parameter to the Trace CPU set to 1.0 by default. The compute delay is manipulated to effectively achieve the frequency at which the nodes become ready and thus scale the frequency of the Trace CPU. Change-Id: Iaabbd57806941ad56094fcddbeb38fcee1172431 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>	2016-09-15 18:01:09 +01:00
Radhika Jagtap	7d3600c309	cpu: Add TraceCPU to playback elastic traces This patch defines a TraceCPU that replays trace generated using the elastic trace probe attached to the O3 CPU model. The elastic trace is an execution trace with data dependencies and ordering dependencies annoted to it. It also replays fixed timestamp instruction fetch trace that is also generated by the elastic trace probe. The TraceCPU inherits from BaseCPU as a result of which some methods need to be defined. It has two port subclasses inherited from MasterPort for instruction and data ports. It issues the memory requests deducing the timing from the trace and without performing real execution of micro-ops. As soon as the last dependency for an instruction is complete, its computational delay, also provided in the input trace is added. The dependency-free nodes are maintained in a list, called 'ReadyList', ordered by ready time. Instructions which depend on load stall until the responses for read requests are received thus achieving elastic replay. If the dependency is not found when adding a new node, it is assumed complete. Thus, if this node is found to be completely dependency-free its issue time is calculated and it is added to the ready list immediately. This is encapsulated in the subclass ElasticDataGen. If ready nodes are issued in an unconstrained way there can be more nodes outstanding which results in divergence in timing compared to the O3CPU. Therefore, the Trace CPU also models hardware resources. A sub-class to model hardware resources is added which contains the maximum sizes of load buffer, store buffer and ROB. If resources are not available, the node is not issued. The 'depFreeQueue' structure holds nodes that are pending issue. Modeling the ROB size in the Trace CPU as a resource limitation is arguably the most important parameter of all resources. The ROB occupancy is estimated using the newly added field 'robNum'. We need to use ROB number as sequence number is at times much higher due to squashing and trace replay is focused on correct path modeling. A map called 'inFlightNodes' is added to track nodes that are not only in the readyList but also load nodes that are executed (and thus removed from readyList) but are not complete. ReadyList handles what and when to execute next node while the inFlightNodes is used for resource modelling. The oldest ROB number is updated when any node occupies the ROB or when an entry in the ROB is released. The ROB occupancy is equal to the difference in the ROB number of the newly dependency-free node and the oldest ROB number in flight. If no node dependends on a non load/store node then there is no reason to track it in the dependency graph. We filter out such nodes but count them and add a weight field to the subsequent node that we do include in the trace. The weight field is used to model ROB occupancy during replay. The depFreeQueue is chosen to be FIFO so that child nodes which are in program order get pushed into it in that order and thus issued in the in program order, like in the O3CPU. This is also why the dependents is made a sequential container, std::set to std::vector. We only check head of the depFreeQueue as nodes are issued in order and blocking on head models that better than looping the entire queue. An alternative choice would be to inspect top N pending nodes where N is the issue-width. This is left for future as the timing correlation looks good as it is. At the start of an execution event, first we attempt to issue such pending nodes by checking if appropriate resources have become available. If yes, we compute the execute tick with respect to the time then. Then we proceed to complete nodes from the readyList. When a read response is received, sometimes a dependency on it that was supposed to be released when it was issued is still not released. This occurs because the dependent gets added to the graph after the read was sent. So the check is made less strict and the dependency is marked complete on read response instead of insisting that it should have been removed on read sent. There is a check for requests spanning two cache lines as this condition triggers an assert fail in the L1 cache. If it does then truncate the size to access only until the end of that line and ignore the remainder. Strictly-ordered requests are skipped and the dependencies on such requests are handled by simply marking them complete immediately. The simulated seconds can be calculated as the difference between the final_tick stat and the tickOffset stat. A CountedExitEvent that contains a static int belonging to the Trace CPU class as a down counter is used to implement multi Trace CPU simulation exit.	2015-12-07 16:42:15 -06:00

13 Commits