derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	9e6a87e67a	dev-amdgpu: Writeback PM4 queue rptr when empty (#597 ) The GPU device keeps a local copy of each ring buffers read pointer (rptr) to avoid constant DMAs to/from host memory. This means it needs to be periodically updated on the host side as the driver uses this to determine how much space is left in the queue and may hang if it believe the queue is full. For user-mode queues, this already happens when queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is never updated leading to a hang. In this patch the rptr for all queues is reported back to the kernel whenever the queue reaches an empty state (rptr == wptr). Additionally to handle PM4 queue wrap-around, the queue processing function checks if the queue is not empty instead of rptr < wptr. This is state because the driver fills PM4 queues with NOP packets on initialization and when wrap around occurs. Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f	2023-11-27 11:02:11 -08:00
Bobby R. Bruce	0f6eabe8c9	ext,github,tests: Update DRAMSys tests to v5.0 and handle new dependencies (#577 ) #525 Updated DRAMSys to v5.0. This PR further improves v5.0 inforporation into gem5 by better managing its new dependencies and updating the DRAMSys tests to use v5.0. This PR: 1. Adds a check which throws warning if DRAMSys cannot be build due to a missing `cmake` instead of failing with a build error. `cmake` is not a hard gem5 requirement. It is only required to build DRAMSys in the cases it is required. It is therefore prudent to not fail a build in cases `cmake` is not present on the host system. 2. Updates the "all-dependency" Docker images to include the optional dependencies `git-lfs` (needed to clone the DRAMSys repo when running the command outlined in ext/dramsys/README -- introduced in #525) and `cmake` (needed to build DRAMSys). 3. Updates the Weekly workflow's `dramsys-tests`' `Checkout DRAMSys` job to clone DRAMSys in the same manner as outlined in ext/dramsys/README. This ensures the `dram-systests` test the instructions we give users. 4. `.gitignore` is added to ext/dramsys to ignore the ext/dramsys/DRAMSys directory when cloned for building and integration into gem5. (2.) Should fix our failing weekly tests: https://github.com/gem5/gem5/actions/runs/6912511984/job/18808339821 and (3.) will ensure the changes introduced in #525 are tested.	2023-11-27 09:37:11 -08:00
Harshil Patel	1de992bc75	tests: fix lulesh (#600 ) - fixed the broken command that was causing lulesh to fail the run Change-Id: I4e8a310f153d86deb8829f41b5ddd0c317df23cb	2023-11-27 07:42:59 -08:00
Matthew Poremba	cc9f81b08a	arch-vega,arch-gcn3: Bugfix V_PERM_B32 and V_OR3_B32 (#599 ) The V_PERM_B32 instruction is selecting the correct byte, but is shifting into place moving by bits instead of bytes. The V_OR3_B32 instruction is calling the wrong instruction implementation in the decoder. This patch fixes both issues plus a bonus fix for GCN3's V_PERM_B32. (GCN3 does not have V_OR3_B32). Change-Id: Ied66c43981bc4236f680db42a9868f760becc284	2023-11-26 23:22:01 -08:00
Bobby R. Bruce	0b2c56ef66	mem-cache: Revert "Prefetchers Improvements" (#581 ) Reverts gem5/gem5#564 to fix #580. Discussion in #581 showed there may be a fix to this but reverting for now until a better solution is found.	2023-11-26 18:43:21 -08:00
Bobby R. Bruce	ab1d5dc3a0	arch-arm: Fix Virtual Interrupt logic in secure mode (#584 ) This PR is fixing remaining issues in the ArmISA::Interrupt class; more specifically it is enabling virtual interrupts in secure mode (when FEAT_SEL2 is present). Previous version was assuming no virtual interrupt was possible in secure mode. We fix this assumption by replacing the security check with the EL2Enabled helper which closely matches the Arm pseudocode	2023-11-26 18:11:08 -08:00
Bobby R. Bruce	36e83943b5	tests,misc: Update DRAMSys test clone command This clone is updated to reflect the new advice given in ext/dramasys/README that was introduced in PR https://github.com/gem5/gem5/pull/525 to upgrade DRAMSysm to v5.0. Change-Id: I868619ecc1a44298dd3885e5719979bdaa24e9c2	2023-11-26 17:10:40 -08:00
Bobby R. Bruce	8f9a328652	util-docker: Add 'cmake' to all-deps 'cmake' is required to build DRAMSysm. This is an optional dependency for compiling DRAMSys. It is therefore not required. It is included in the "all-dependencies" Docker images as they may be needed if DRAMSys is desired. Change-Id: I1a3e1a6fa2da4d0116d423e9267d4d3095000d4e	2023-11-26 17:10:40 -08:00
Bobby R. Bruce	575114b63b	ext: Add .gitignore to ext/dramsys Change-Id: Ifc1a3c77b56cbe5777d041a88b2c0d5cb77eaf89	2023-11-26 17:10:40 -08:00
Bobby R. Bruce	cb61d01ede	ext: Add 'cmake' dep check to DRAMSys install CMake is not required to build gem5. It is only required to build and link the optional DRAMSysm library. Therefore, if the DRAMSys repo has been cloned but CMake is not present this patch ensures no attempt at building or linking DRAMSysm is made. A warning is thrown inform the user of the missing CMake. Change-Id: I4d22e3a16655fd90f6b109b4e75859628f7d532d	2023-11-26 17:10:40 -08:00
Matthew Poremba	6e433ed885	mem-ruby: Fixes for new AtomicWait event in VIPER TCC (#585 ) The AtomicWait event was not being woken up properly due to the numPending count in the TBE not being decremented. This patch decrements the count when Data is returned. Since that moves to a base state, the TBE should no longer be needed. Additionally added a transition which stalls and wait when an AtomicWait occurs while in WI state so that it retries. Change-Id: Ic8bfc700f9df3f95bea0799121898926a23d8163	2023-11-22 14:05:43 -08:00
Bobby R. Bruce	23a22ed95c	dev-amdgpu: Add VMID map to checkpoint (#570 ) When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration	2023-11-22 10:05:21 -08:00
Giacomo Travaglini	098feb4042	arch-arm: Fix WFI sleeping in secure mode The CPU should not sleep with a pending virtual interrupt if secure mode EL2 is supported (FEAT_SEL2) Change-Id: Ib71c4a09d76a790331cf6750da45f83694946aee Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-21 13:39:41 +00:00
Giacomo Travaglini	b8fabc15d9	arch-arm: Revamp takeVirtualInt to take FEAT_SEL2 into account Similarly to the physical version [1], we rewrite the masking logic to account for FEAT_SEL2. The interrupt table is taken from the Arm architecture reference manual (version DDI 0487H.a, section D1.3.6, table R_BKHXL) [1]: https://github.com/gem5/gem5/pull/430 Change-Id: Icb6eb1944d8241293b3ef3c349b20f3981bcc558 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-21 13:39:41 +00:00
Giacomo Travaglini	49d07578de	arch-arm: Call take(Virtual)Int only when needed There is no need to call the methods for every kind of interrupt. A pending one should short-circuit the remaining checks Change-Id: I2c9eb680a7baa4644745b8cbe48183ff6f8e3102 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-21 13:39:41 +00:00
Giacomo Travaglini	bb323923f2	arch-arm: Simplify get/checkInterrupts with takeVirtualInt With this patch we align virtual interrupts with respect to the physical ones by introducing a matching takeVirtualInt method. Change-Id: Ib7835a21b85e4330ba9f051bc8fed691d6e1382e Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-21 13:39:41 +00:00
Giacomo Travaglini	3d41339366	arch-arm: Fix ISR_EL1 register read in secure mode Vitual interrupts are enabled in secure mode as well after the introduction of FEAT_SEL2. Replacing the secure mode check with the EL2Enabled one Change-Id: Id685a05d5adfa87b2a366f6be42bf344168927d4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-21 13:39:41 +00:00
Giacomo Travaglini	90b711e879	arch-arm: Define an ISR type register Change-Id: I358050a507fb76654e87165720dfb3b2ea6ca838 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-21 13:39:41 +00:00
Hoa Nguyen	3009e0fb57	mem-ruby: Fix typo in CHI's Send_CompI (#579 ) The destination for the response is set twice.	2023-11-20 21:38:13 -08:00
Bobby R. Bruce	d772f3967b	dev: Fix `std::min` type mismatch in reg_bank.hh (#582 ) https://github.com/gem5/gem5/pull/386 included two cases in "src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer of type `size_t` and another of type `Addr`. This causes an error on my Apple Silicon Mac as the comparison between an "unsigned long" and an "unsigned long long" is not permitted. To fix this issue this patch changes `reg_size` from `size_t` to `Addr`, as well as it the types of the values it was derived from and the variable used to hold the return from the `std::min` calls. While not completely correct typing from a labelling perspective (`reg_bytes` is not an address), functions in "src/dev/reg_bank.hh" already abuse `Addr` in this way frequently (for example, `bytes` in the `write` function).	2023-11-20 21:37:45 -08:00
Bobby R. Bruce	f26867a075	mem-cache: Revert "Prefetchers Improvements" Reverts PR https://github.com/gem5/gem5/pull/564 Reverts commits: * `047a494c2b` * `2abd65c270` * `38045d7a25` * `6416304e07` * `8598764a03` Change-Id: Id523acc1778c3f827637302a6465f5a9e539d6b5	2023-11-20 19:49:04 -08:00
Vishnu Ramadas	06161ded8c	dev-amdgpu: Add VMID map to checkpoint When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit checkpoints the existing VMID map so that any new doorbells after restoration use a unique queue ID Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-20 21:19:17 -06:00
Bobby R. Bruce	08c0d1f27a	dev: Fix `std::min` type mismatch in reg_bank.hh https://github.com/gem5/gem5/pull/386 included two cases in "src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer of type `size_t` and another of type `Addr`. This cause an error on my Apple Silicon Mac as this is a comparison between an "unsigned long" and an "unsigned long long" which (at least on my setup) was not permitted. To fix this issue the `reg_size` was changed from `size_t` to `Addr`, as well as it the types of the values it was derived from and the variable used to hold the return from the `std::min` calls. Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746	2023-11-20 17:33:44 -08:00
Matthew Poremba	3896673ddc	util: Bump GPUFS build docker to 5.4.2 (#571 ) This dockerfile is used to build applications (e.g., from gem5-resources) which can be run using full system mode in a GPU build. The next releases disk image will use ROCm 5.4.2, therefore bump the version from 4.2 to that version. Again this is used to build input applications only and is not needed to run or compile gem5 with GPUFS. For example: $ docker build -t rocm54-build . /some/gem5-resources/src/gpu/lulesh$ docker run --rm -u $UID:$GID -v \ ${PWD}:${PWD} -w ${PWD} rocm54-build make Change-Id: If169c8d433afb3044f9b88e883ff3bb2f4bc70d2	2023-11-18 18:13:06 -08:00
Vishnu Ramadas	d19d6fc31e	dev-amdgpu: Add PM4 queue ID to GPU used VMID map When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-16 17:30:00 -06:00
Jason Lowe-Power	db6a869786	mem-cache: Prefetchers Improvements (#564 ) This pull request contains a set of small patches which fix some bugs in the gem5 prefetchers, and aligns out-of-the box prefetcher performance more closely with that which a typical user would expect. The performance patches have been tested with an out-of-the-box (untuned) Stride prefetcher configuration against a set of SPEC 2017 SimPoints, and show a modest IPC uplift across the board, with no IPC degradation. The new defaults were identified as part of work on gem5 prefetchers undertaken by Nikolaos Kyparissas while on internship at Arm.	2023-11-16 15:22:26 -08:00
Giacomo Travaglini	4ca2efac16	mem-ruby: AtomicNoReturn should check comp_anr instead of comp_wu (#545 ) The comp_anr parameter is currently unused. Both parameters (comp_wu and comp_anr) are set to false by default Change-Id: If09567504540dbee082191d46fcd53f1363d819f Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-16 15:20:51 -08:00
Matthew Poremba	4965367724	mem-ruby, gpu-compute: fix SQC/TCP requests to same line (#540 ) Currently, the GPU SQC (L1I$) and TCP (L1D$) have a performance bug where they do not behave correctly when multiple requests to the same cache line overlap one another. The intended behavior is that if the first request that arrives at the Ruby code for the SQC/TCP misses, it should send a request to the GPU TCC (L2$). If any requests to the same cache line occur while this first request is pending, they should wait locally at the L1 in the MSHRs (TBEs) until the first request has returned. At that point they can be serviced, and assuming the line has not been evicted, they should hit. For example, in the following test (on 1 GPU thread, in 1 WG): load Arr[0] load Arr[1] load Arr[2] The expected behavior (confirmed via profiling on real GPUs) is that we should get 1 miss (Arr[0]) and 2 hits (Arr[1], Arr[2]) for such a program. However, the current support in the VIPER SQC/TCP code does not model this correctly. Instead it lets all 3 concurrent requests go straight through to the TCC instead of stopping the Arr[1] and Arr[2] requests locally while Arr[0] is serviced. This causes all 3 requests to be classified as misses. To resolve this, this patch adds support into the SQC/TCP code to prevent subsequent, concurrent requests to a pending cache line from being sent in parallel with the original one. To do this, we add an additional transient state (IV) to indicate that a load is pending to this cache line. If a subsequent request of any kind to the same cache line occurs while this load is pending, the requests are put on the local wait buffer and woken up when the first request returns to the SQC/TCP. Likewise, when the first load is returned to the SQC/TCP, it transitions from IV --> V. As part of this support, additional transitions were also added to account for corner cases such as what happens when the line is evicted by another request that maps to the same set index while the first load is pending (the line is immediately given to the new request, and when the load returns it completes, wakes up any pending requests to the same line, but does not attempt to change the state of the line) and how GPU bypassing loads and stores should interact with the pending requests (they are forced to wait if they reach the L1 after the pending, non-bypassing load; but if they reach the L1 before the non-bypassing load then they make sure not to change the state of the line from IV if they return before the non-bypassing load). As part of this change, we also move the MSHR behavior from internally in the GPUCoalescer for loads to the Ruby code (like all other requests). This is important to get correct hits and misses in stats and other prints, since the GPUCoalescer MSHR behavior assumes all requests serviced out of its MSHR also miss if the original request to that line missed. Although the SQC does not support stores, the TCP does. Thus, we could have applied a similar change to the GPU stores at the TCP. However, since the TCP support assumes write-through caches and does not attempt to allocate space in the TCP, we elected not to add this support since it seems to run contrary to the intended behavior (i.e., the intended behavior seems to be that writes just bypass the TCP and thus should not need to wait for another write to the same cache line to complete). Additionally, making these changes introduced issues with deadlocks at the TCC. Specifically, some Pannotia applications have accesses to the same cache line where some of the accesses are GLC (i.e., they bypass the GPU L1 cache) and others are non-GLC (i.e., they want to be cached in the GPU L1 cache). We have support already per CU in the above code. However, the problem here is that these requests are coming from different CUs and happening concurrently (seemingly because different WGs are at different points in the kernel around the same time). This causes a problem because our support at the TCC for the TBEs overwrites the information about the GPU bypassing bits (SLC, GLC) every time. The problem is when the second (non-GLC) load reaches the TCC, it overwrites the SLC/GLC information for the first (GLC) load. Thus, when the the first load returns from the directory/memory, it no longer has the GLC bit set, which causes an assert failure at the TCP. After talking with other developers, it was decided the best way handle this and attempt to model real hardware more closely was to move the point at which requests are put to sleep on the wakeup buffer from the TCC to the directory. Accordingly, this patch includes support for that -- now when multiple loads (bypassing or non-bypassing) from different CUs reach the directory, all but the first one will be forced to wait there until the first one completes, then will be woken up and performed. This required updating the WTRequestor information at the TCC to pass the information about what CU performed the original request for loads as well (otherwise since the TBE can be updated by multiple pending loads, we can't tell where to send the final result to). Thus, I changed the field to be named CURequestor instead of WTRequestor since it is now used for more than stores. Moreover, I also updated the directory to take this new field and the GLC information from incoming TCC requests and then pass that information back to the TCC on the response -- without doing this, because the TBE can be updated by multiple pending, concurrent requests we cannot determine if this memory request was a bypassing or non-bypassing request. Finally, these changes introduced a lot of additional contention and protocol stalls at the directory, so this patch converted all directory uses of z_stall to instead put requests on the wakeup buffer (and wake them up when the current request completes) instead. Without this, protocol stalls cause many applications to deadlock at the directory. However, this exposed another issue at the TCC: other applications (e.g., HACC) have a mix of atomics and non-atomics to the same cache line in the same kernel. Since the TCC transitions to the A state when an atomic arrives. For example, after the first pending load returns to the TCC from the directory, which causes the TCC state to become V, but when there are still other pending loads at the TCC. This causes invalid transition errors at the TCC when those pending loads return, because the A state thinks they are atomics and decrements the pending atomic count (plus the loads are never sent to the TCP as returning loads). This patch fixes this by changing the TCC TBEs to model the number of pending requests, and not allowing atomics to be issued from the TCC until all prior, pending non-atomic requests have returned. Change-Id: I37f8bda9f8277f2355bca5ef3610f6b63ce93563	2023-11-16 14:24:00 -08:00
Bobby R. Bruce	bfe899e48e	stdlib, resources: Update JSON data in workload (#532 ) - resources field in workload now supports a dict with resources id and version. - Older workload JSON are still supported but added a deprecation waring	2023-11-16 10:11:13 -08:00
Giacomo Travaglini	047a494c2b	mem-cache: Optimize strided prefetcher address generation This commit optimizes the address generation logic in the strided prefetcher by introducing the following changes (d is the degree of the prefetcher) * Evaluate the fixed prefetch_stride only once (and not d-times) * Replace 2d multiplications (d * prefetch_stride and distance * prefetch_stride) with additions by updating the new base prefetch address while looping Change-Id: I49c52333fc4c7071ac3d73443f2ae07bfcd5b8e4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com> Reviewed-by: Tiberiu Bucur <tiberiu.bucur@arm.com>	2023-11-16 09:48:15 +00:00
Nikolaos Kyparissas	2abd65c270	mem: added distance parameter to stride prefetcher The Stride Prefetcher will skip this number of strides ahead of the first identified prefetch, then generate `degree` prefetches at `stride` intervals. A value of zero indicates no skip (i.e. start prefetching from the next identified prefetch address). This parameter can be used to increase the timeliness of prefetches by starting to prefetch far enough ahead of the demand stream to cover the memory system latency. [Richard Cooper <richard.cooper@arm.com>: - Added detail to commit comment and `distance` Param documentation. - Changed `distance` Param from `Param.Int` to `Param.Unsigned`. ] Change-Id: I6c4e744079b53a7b804d8eab93b0f07b566f0c08 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Signed-off-by: Richard Cooper <richard.cooper@arm.com>	2023-11-16 09:48:09 +00:00
Yu-Cheng Chang	ceabe86b31	arch-riscv: Add overrides to RISC-V Interrupts class (#568 )	2023-11-15 18:36:15 -08:00
Matt Sinclair	c3326c78e6	mem-ruby, gpu-compute: fix SQC/TCP requests to same line Currently, the GPU SQC (L1I$) and TCP (L1D$) have a performance bug where they do not behave correctly when multiple requests to the same cache line overlap one another. The intended behavior is that if the first request that arrives at the Ruby code for the SQC/TCP misses, it should send a request to the GPU TCC (L2$). If any requests to the same cache line occur while this first request is pending, they should wait locally at the L1 in the MSHRs (TBEs) until the first request has returned. At that point they can be serviced, and assuming the line has not been evicted, they should hit. For example, in the following test (on 1 GPU thread, in 1 WG): load Arr[0] load Arr[1] load Arr[2] The expected behavior (confirmed via profiling on real GPUs) is that we should get 1 miss (Arr[0]) and 2 hits (Arr[1], Arr[2]) for such a program. However, the current support in the VIPER SQC/TCP code does not model this correctly. Instead it lets all 3 concurrent requests go straight through to the TCC instead of stopping the Arr[1] and Arr[2] requests locally while Arr[0] is serviced. This causes all 3 requests to be classified as misses. To resolve this, this patch adds support into the SQC/TCP code to prevent subsequent, concurrent requests to a pending cache line from being sent in parallel with the original one. To do this, we add an additional transient state (IV) to indicate that a load is pending to this cache line. If a subsequent request of any kind to the same cache line occurs while this load is pending, the requests are put on the local wait buffer and woken up when the first request returns to the SQC/TCP. Likewise, when the first load is returned to the SQC/TCP, it transitions from IV --> V. As part of this support, additional transitions were also added to account for corner cases such as what happens when the line is evicted by another request that maps to the same set index while the first load is pending (the line is immediately given to the new request, and when the load returns it completes, wakes up any pending requests to the same line, but does not attempt to change the state of the line) and how GPU bypassing loads and stores should interact with the pending requests (they are forced to wait if they reach the L1 after the pending, non-bypassing load; but if they reach the L1 before the non-bypassing load then they make sure not to change the state of the line from IV if they return before the non-bypassing load). As part of this change, we also move the MSHR behavior from internally in the GPUCoalescer for loads to the Ruby code (like all other requests). This is important to get correct hits and misses in stats and other prints, since the GPUCoalescer MSHR behavior assumes all requests serviced out of its MSHR also miss if the original request to that line missed. Although the SQC does not support stores, the TCP does. Thus, we could have applied a similar change to the GPU stores at the TCP. However, since the TCP support assumes write-through caches and does not attempt to allocate space in the TCP, we elected not to add this support since it seems to run contrary to the intended behavior (i.e., the intended behavior seems to be that writes just bypass the TCP and thus should not need to wait for another write to the same cache line to complete). Additionally, making these changes introduced issues with deadlocks at the TCC. Specifically, some Pannotia applications have accesses to the same cache line where some of the accesses are GLC (i.e., they bypass the GPU L1 cache) and others are non-GLC (i.e., they want to be cached in the GPU L1 cache). We have support already per CU in the above code. However, the problem here is that these requests are coming from different CUs and happening concurrently (seemingly because different WGs are at different points in the kernel around the same time). This causes a problem because our support at the TCC for the TBEs overwrites the information about the GPU bypassing bits (SLC, GLC) every time. The problem is when the second (non-GLC) load reaches the TCC, it overwrites the SLC/GLC information for the first (GLC) load. Thus, when the the first load returns from the directory/memory, it no longer has the GLC bit set, which causes an assert failure at the TCP. After talking with other developers, it was decided the best way handle this and attempt to model real hardware more closely was to move the point at which requests are put to sleep on the wakeup buffer from the TCC to the directory. Accordingly, this patch includes support for that -- now when multiple loads (bypassing or non-bypassing) from different CUs reach the directory, all but the first one will be forced to wait there until the first one completes, then will be woken up and performed. This required updating the WTRequestor information at the TCC to pass the information about what CU performed the original request for loads as well (otherwise since the TBE can be updated by multiple pending loads, we can't tell where to send the final result to). Thus, I changed the field to be named CURequestor instead of WTRequestor since it is now used for more than stores. Moreover, I also updated the directory to take this new field and the GLC information from incoming TCC requests and then pass that information back to the TCC on the response -- without doing this, because the TBE can be updated by multiple pending, concurrent requests we cannot determine if this memory request was a bypassing or non-bypassing request. Finally, these changes introduced a lot of additional contention and protocol stalls at the directory, so this patch converted all directory uses of z_stall to instead put requests on the wakeup buffer (and wake them up when the current request completes) instead. Without this, protocol stalls cause many applications to deadlock at the directory. However, this exposed another issue at the TCC: other applications (e.g., HACC) have a mix of atomics and non-atomics to the same cache line in the same kernel. Since the TCC transitions to the A state when an atomic arrives. For example, after the first pending load returns to the TCC from the directory, which causes the TCC state to become V, but when there are still other pending loads at the TCC. This causes invalid transition errors at the TCC when those pending loads return, because the A state thinks they are atomics and decrements the pending atomic count (plus the loads are never sent to the TCP as returning loads). This patch fixes this by changing the TCC TBEs to model the number of pending requests, and not allowing atomics to be issued from the TCC until all prior, pending non-atomic requests have returned. Change-Id: I37f8bda9f8277f2355bca5ef3610f6b63ce93563	2023-11-15 19:23:51 -06:00
Matt Sinclair	065ddf759f	mem-ruby, gpu-compute: fix bug with GPU bypassing loads The current GPU TCP (L1D$) Ruby SLICC code had a bug where a GPU load that wants to bypass the L1D$ (e.g., GLC or SLC bit was set) but the line is in Invalid when that request arrives, results in a non-bypassing load being sent to the GPU TCC (L2$) instead of a bypassing load. This issue was not caught by currently nightly or weekly tests, because the tests do not test for correctness in terms of hits and misses in the caches. However, tests for these corner cases expose this issue. To fix, this, this patch removes the check that the entry is valid when deciding what to do with a bypassing GPU load -- since the TCP Ruby code has transitions for bypassing loads in both I and V, we can simply call the LoadBypassEvict event in both cases and the appropriate transition will handle the bypassing load given the cache line's current state in the TCP. Change-Id: Ia224cefdf56b4318b2bcbd0bed995fc8d3b62a14	2023-11-15 19:23:51 -06:00
hungweihsuG	83f1fe3fec	dev: add debug flag in register bank. (#386 ) Print extra logs for the full/partial read/write access to the registers through the register bank. The debug flag is empty by default and would not print anything. Test: run unittest of dev/reg_bank.test.xml to check the behavior would not affect the original functionality. run gem5 with debug flags and use m5term to poke on registers.	2023-11-15 10:04:46 -08:00
wmin0	a8440f367d	arch-riscv: Move fault handler addr logic to ISA (#554 ) mtvec.mode is extended in the new riscv proposal, like fast interrupt. This change moves that part from Fault class to ISA class for extendable. Ref: https://github.com/riscv/riscv-fast-interrupt	2023-11-15 10:04:01 -08:00
BujSet	4a5ec70e08	gpu-compute: Minor edits for atomic no returns and stores (#565 ) Since returned data is not needed for AtomicNoReturn and Store memory requests, the coalescer need not spend time writing in dummy data for packets of these types. Change-Id: Ie669e8c2a3bf44b5b0c290f62c49c5d4876a9a6a	2023-11-15 07:20:07 -08:00
Bobby R. Bruce	30787b59d4	tests: Remove multiple suites per job for Weekly tests (#562 ) I believe the weekly test failures (example: https://github.com/gem5/gem5/actions/runs/6832805510/job/18592876184) are due to a container running out of memory when running the very-long x86 boot tests. I found that the `-t $(nproc)` flag meant, on our runners, 4 x86 full system gem5 simulations were being pawned. Locally I found these gem5 x86 boot sims can reach 4GB in size so I suspect they eventually grew big enough exceed the 16GB memory of the VM. I have removed `-t $(nproc)` meaning each execution to see if this fixes the issue (we may want to use `-t 2` later if the Weeklies take too long running single-threaded).	2023-11-14 11:00:07 -08:00
Bobby R. Bruce	8859592893	tests,gpu-compute: Fix Lulesh 'Obtain LULESH' step (#563 ) The `working-directory: ${{ github.workspace }}` line was included by mistake and resulted in this step failing as the command was being executed in the wrong directory. Example failure: https://github.com/gem5/gem5/actions/runs/6832831307/job/18593080567	2023-11-14 08:43:00 -08:00
Derek Christ	e95cab429f	configs,ext,stdlib: Update DRAMSys integration (#525 ) Recent breaking changes in the DRAMSys API require user code to be updated. These updates have been applied to the gem5 integration. Furthermore, as DRAMSys started to use CMake dependency management, it is no longer sensible to maintain two separate build systems for DRAMSys. The use of the DRAMSys integration in gem5 will therefore from now on require that CMake is installed on the target machine. Additionally, support for snapshots have been implemented into DRAMSys and coupled with gem5's checkpointing API.	2023-11-14 08:05:11 -08:00
Derek Christ	99553fdbee	systemc: Fix two bugs in gem5-to-tlm bridge (#542 ) This commit fixes a violation of the TLM2.0 protocol as well as a bug regarding back-pressure: - In the BEGIN_REQ phase, transaction objects are required to set their response status to TLM_INCOMPLETE_RESPONSE. This was not the case in the packet2payload function that converts gem5 packets to TLM2.0 generic payloads. - When the target applies back-pressure to the initiator, an assert condition was triggered as soon as the response is retried. The cause of this was an unintentional nullptr-access into a map.	2023-11-14 08:02:58 -08:00
BujSet	65b44e6516	mem-ruby: Fix for not creating log entries on atomic no return requests (#546 ) Augmenting Datablock and WriteMask to support optional arg to distinguish between return and no return. In the case of atomic no return requests, log entries should not be created when performing the atomic. Change-Id: Ic3112834742f4058a7aa155d25ccc4c014b60199a	2023-11-14 07:54:42 -08:00
Daniel Kouchekinia	be5c03ea9f	mem-ruby,configs: Add GPU GLC Atomic Resource Constraints (#120 ) Added a resource constraint, AtomicALUOperation, to GLC atomics performed in the TCC. The resource constraint uses a new class, ALUFreeList array. The class assumes the following: - There are a fixed number of atomic ALU pipelines - While a new cache line can be processed in each pipeline each cycle, if a cache line is currently going through a pipeline, it can't be processed again until it's finished Two configuration parameters have been used to tune this behavior: - tcc-num-atomic-alus corresponds to the number of atomic ALU pipelines - atomic-alu-latency corresponds to the latency of atomic ALU pipelines Change-Id: I25bdde7dafc3877590bb6536efdf57b8c540a939	2023-11-14 07:48:48 -08:00
Nikolaos Kyparissas	38045d7a25	mem-cache: Added clean eviction check for prefetchers. pkt->req->isCacheMaintenance() would not include a check for clean eviction before notifying the prefetcher, causing gem5 to crash. Change-Id: I1d082a87a3908b1ed46c5d632d45d8b09950b382 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-14 15:20:52 +00:00
Richard Cooper	6416304e07	mem-cache: Update default prefetch options. Update the default prefetch options to achieve out-of-the box prefetcher performance closer to that which a typical user would expect. Configurations that set these parameters explicitly will be unaffected. The new defaults were identified as part of work on gem5 prefetchers undertaken by Nikolaos Kyparissas while on internship at Arm. Change-Id: Id63868c7c8f00ee15a0b09a6550780a45ae67e55 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-14 15:20:52 +00:00
Richard Cooper	8598764a03	mem-cache: Squash prefetch queue entries by block address. Prefetch queue entries were being squashed by comparing the address of each queued prefetch against the block address of the demand access. Only prefetches that happen to fall on a cache-line block boundary would be squashed. This patch converts the prefetch addresses to block addresses before comparison. Change-Id: I55ecb4919e94ad314b91c7795bba257c550b1528 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-14 15:20:52 +00:00
Yu-Cheng Chang	f11227b4a0	systemc: Fix gcc13 systemC compilation error (#520 ) issue: https://github.com/gem5/gem5/issues/472	2023-11-14 03:54:35 -08:00
Bobby R. Bruce	6ac6d0c340	tests,misc: Add "build/ALL/gem5.fast" Clang compilation to CI (#432 ) While we do run compiler tests weekly, 9/10 the issue is a strict check in clang we did not check before incorporating code into the codebase. Therefore, running a clang compilation as part of our CI would help us catch errors quicker.	2023-11-14 03:53:28 -08:00
Daniel Kouchekinia	dde3d10aea	cpu: Remove SLC bit restraint for GPU tester (#552 ) This reverts gem5#133, the temporary work-around for gem5#131, allowing both SLC and GLC atomic requests to be made in the GPU tester. The underlying issues behind gem5#131 have been resolved by gem5#367 and gem5#397.	2023-11-14 03:47:34 -08:00
Rajarshi Das	f71450d26d	python,util: Fix magic number check in decode_inst_dep_trace.py (#560 ) The decode_inst_dep_trace.py opens the trace file in read mode, and subsequently reads the magic number from the trace file. Once the number is read, it is compared against the string 'gem5' without decoding it first. This causes the comparison to fail. The fix addresses this by calling the decode() routine on the output of the read() call. Please find the details in the associated issue #543	2023-11-14 03:47:04 -08:00

1 2 3 4 5 ...

20936 Commits