The 'max_dequeue_rate' parameter limits the rate at which messages can
be dequeued in a single cycle. When set, 'isReady' returns false if
after max_dequeue_rate is reached.
This can be used to fine tune the performance of cache controllers.
For the record, other ways of achieving a similar effect could be:
1) Modifying the SLICC compiler to limit message consumption in the
generated wakeup() function
2) Set the buffer size to max_dequeue_rate. This can potentially cut the
the expected throughput in half. For instance if a producer can
enqueue every cycle, and a consumer can dequeue every cycle, a
message can only be actually enqueued every two (assuming
buffer_size=1) since the buffer entries available after dequeue
are only visible in the next cycle (even if the consumer executes
before the producer).
JIRA: https://gem5.atlassian.net/browse/GEM5-920
Change-Id: I3a446c7276b80a0e3f409b4fbab0ab65ff5c1f81
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41862
Reviewed-by: Meatboy 106 <garbage2collector@gmail.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Three changes below:
1. The m_stall_time was declared as statistics::Average, but
statistics::Average uses AvgStor as storage and this works as per-tick
average stat. In the case of m_stall_time, Scalar should be used to get
the calculation right.
2. The function used to get an enqueue time was changed since the
getTime() returns the time when the message was created.
3. Record the stall time only when the message is really dequeued
from the buffer (stall time is not evaluated when the message is moved
to stall map).
Change-Id: I090d19828b5c43f0843a8b735d3f00f312c436e9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/54363
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Remove the line "For use for simulation and test purposes only" in files
were AMD is the only copyright holder listed in the header. This happens
to be the case for all files where this line exists, removing it
completely from gem5.
Change-Id: I623f266b002f564301b28774f49081099cfc60fd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In order to fix several regression failures [1] the master/slave
terminology in src/cpu/BaseCPU.py was reintroduced [2].
This patch is addressing the issue by providing 2 different
ways of connecting cpu ports:
*) connectBus: The method assumes an object with a bus interface is
passed as an argument, therefore it tries to bind cpu ports to the
bus.mem_side_ports and bus.cpu_side_ports
*) connectAllPorts: No assumption on the port owning device is made.
The method simply accepts ports as arguments which will be directly
connected to the peer cpu ports
This will be used for example by ruby Sequencers
[1]: https://gem5.atlassian.net/browse/GEM5-775
[2]: https://gem5-review.googlesource.com/c/public/gem5/+/34495
Change-Id: I715ab8471621d6e5eb36731d7eaefbedf9663a71
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/52584
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
This was conditioned on the TARGET_ISA being x86 because the code it
replaced was, and that was because the x86 interrupts object had an
extra port that didn't appear for other ISAs. This inconsistency is not
present on either side of this connection, and so we don't need it to be
conditional.
We do, however, need to ensure that the port sends a range change even
if it doesn't have any ranges to send, to satisfy the bookkeeping of the
bus on the other side of the connection. We do that in init, like leaf
devices do.
Change-Id: Idec6f6c5e2cf78b113fb238d0edd2c63d6cd2c23
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/52109
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Some aspects of the formatting in this file were questionable, like
aligning =s between adjacent lines, although not technically against the
style rules as far as I know.
More strangely though, the whole file used three space indents instead
of the typical four.
Change-Id: I7b60f1978c5b2c60a15296b10d09d5701cf7fa5c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/52108
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently MOESI_AMD_Base used in VIPER has a CPUonly parameter which
indicates that messages should not try to add GPU SLICC controllers as
destinations. This adds the analogue GPUonly parameter which indicates
that requests should not try to add CPU SLICC controllers.
Also adds an assert to ensure the outgoing message has at least one
destination. This assert would indicate a misconfiguration.
Change-Id: Ibb0affd4606084fca021f0e7c117d4ff8c06d429
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51928
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The CPUonly variable in MOESI_AMD_Base's Directory indicates that probes
should not be sent to any GPU SLICC controllers as they are not part of
CPU. There is one CPUonly check missing which causes problems in
GPU-only Ruby networks as there is no route to any controllers with that
MachineType. Add a condition to check CPUonly and do nothing in that
case.
Change-Id: I41b6c04feec473e34b04402adfb5978e75b847b6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51927
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
RISC-V atomics carry a atomic functor that needs to be executed in the
cache hierarchy. To implement this in Ruby, we execute the functor in
the hitCallback function. Note that these functions are slightly
different than the atomic functions used in the GPU model and the GPU
coalescer even though they have similar semantics.
This change was tested with RISC-V Linux boot which has a few atomics
and linux boot finishes successfully. Previously, the boot got stuck
after the incorrect atomic operation.
Change-Id: I47a69c05ad9f4267d0220023289116e62b5231be
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51447
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Currently, the GPU VIPER TCC protocol handles races between atomics in
the triggerQueue_in. This in_port does not check for resource
availability, which can cause the trigger queue to execute multiple
times. Although this is the expected behavior, the code for handling
atomic races decrements the atomicDoneCnt flag in the trigger queue,
which is not safe since resource contention may cause it to execute
multiple times.
To resolve this issue, this commit moves the decrementing of this
counter to a new action that is called in an event that happens only
when the race between atomics is detected.
Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51368
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In the GPU VIPER TCC, programs with mixes of atomics and data
accesses to the same address, in the same kernel, can experience
deadlock when large applications (e.g., Pannotia's graph analytics
algorithms) are running on very small GPUs (e.g., the default 4 CU GPU
configuration). In this situation, deadlocks occur due to resource
stalls interacting with the behavior of the current implementation for
handling races between atomic accesses. The specific order of events
causing this deadlock are:
1. TCC is waiting on an atomic to return from directory
2. In the meantime it receives another atomic to the same address -- when
this happens, the TCC increments number of atomics to this address
(numAtomics = 2) that are pending in TBE, and does a write through of the
atomic to the directory.
3. When the first atomic returns from the Directory, it decrements the
numAtomics counter. numAtomics was at 2 though, because of step #2. So
it doesn't deallocate the TBE entry and calls Event:AtomicNotDone.
4. Another request (a LD) to the same address comes along for the same
address. The LD does z_stall since the second atomic is pending –- so the
LD retries every cycle until the deadlock counter times out (or until the
second atomic comes back).
5. The second atomic returns to the TCC. However, because there are so
many LD's pending in the cache, all doing z_stall's and retrying every cycle,
there are a lot of resource stalls. So, when the second atomic returns, it is
forced to retry its operation multiple times -- and each time it decrements
the atomicDoneCnt flag (which was added to catch a race between atomics
arriving and leaving the TCC in 7246f70bfb) repeatedly. As a result
atomicDoneCnt becomes negative.
6. Since this atomicDoneCnt flag is used to determine when Event:AtomicDone
happens, and since the resource stalls caused the atomicDoneCnt flag to become
negative, we never complete the atomic. Which means the pending LD can never
access the line, because it's stuck waiting for the atomic to complete.
7. Eventually the deadlock threshold is reached.
To fix this issue, this commit changes the VIPER TCC protocol from using
z_stall to using the stall_and_wait buffer method that the
Directory-level of the SLICC already uses. This change effectively
prevents resource stalls from dominating the TCC level, by putting
pending requests for a given address in a per-address stall buffer.
These requests are then woken up when the pending request returns.
As part of this change, this change also makes two small changes to the
Directory-level protocol (MOESI_AMD_BASE-dir):
1. Updated the names of the wakeup actions to match the TCC wakeup actions,
to avoid confusion.
2. Changed transition(B, UnblockWriteThrough, U) to check all stall buffers,
as some requests were being placed later in the stall buffer than was
being checked. This mirrors the changes in 187c44fe44 to other Directory
transitions to resolve races between GPU and DMA requests, but for
transitions prior workloads did not stress.
Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51367
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
X86 had a private/arch specific request flag called StoreCheck which it
used to signal to the TLB that it should fault on a load if it would
have faulted had it been a store. That way, you can detect whether a
read-modify-write type of operation is going to fail due to a
translation problem during the read, and don't have to worry about not
doing anything architecturally visible until the store had succeeded,
while also making sure not to do the store part if the modify part
could fail.
It seems that Ruby had hijacked that flag and had an architecture
specific check which was looking for a load which was going to be
followed by a store. The x86 flag was never intended to communicate that
beyond the TLB, and this nominally architecture agnostic component
shouldn't be reaching into the ISA specific flags to try to get that
information.
Instead, this change introduces a new Request flag called
READ_MODIFY_WRITE which is used for the same purpose in x86, but in
general means that a load will be followed by a write in the near
future.
With this new globally applicable flag, the ruby Sequencer class no
longer needs to check what the arch is, nor does it need to access ISA
private data in the request flags. Always doing this check should be no
less efficient than before, because checking the arch involved calling
into the system object, while checking the flag only requires masking a
bit on the flags which the compiler probably already has floating around
for other logic in this function.
Change-Id: Ied5b744d31e7aa8bf25e399b6b321f9d2020a92f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48710
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Gabe Black <gabe.black@gmail.com>
Ruby assumes protocols use directory controllers as memory interface.
Thus, recvAtomic() uses the machine type of directory when it calls
mapAddressToMachine(). However, it doesn't work for CHI since
CHI does not use directory controllers as memory controller interface.
Therefore, the code was modified to check which controller type is used
for memory interface between MachineType_Directory and
MachineType_Memory, which is used for CHI.
Change-Id: If35a06a8a3772ce5e5b994df05c9d94c7770c90d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48403
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously, we assumed that the maximum number of requests that would be
issued by an instruction was equal to the number of threads that were
active for that instruction.
However, if a thread has an access that crosses a cache line, that
thread has a misaligned access, and needs to request both cache lines.
This patch takes that into account by checking the status vector for
each thread in that instruction to determine the number of requests.
Change-Id: I1994962c46d504b48654dbd22bcd786c9f382fd9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48341
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Apply the gem5 namespace to the codebase.
Some anonymous namespaces could theoretically be removed,
but since this change's main goal was to keep conflicts
at a minimum, it was decided not to modify much the
general shape of the files.
A few missing comments of the form "// namespace X" that
occurred before the newly added "} // namespace gem5"
have been added for consistency.
std out should not be included in the gem5 namespace, so
they weren't.
ProtoMessage has not been included in the gem5 namespace,
since I'm not familiar with how proto works.
Regarding the SystemC files, although they belong to gem5,
they actually perform integration between gem5 and SystemC;
therefore, it deserved its own separate namespace.
Files that are automatically generated have been included
in the gem5 namespace.
The .isa files currently are limited to a single namespace.
This limitation should be later removed to make it easier
to accomodate a better API.
Regarding the files in util, gem5:: was prepended where
suitable. Notice that this patch was tested as much as
possible given that most of these were already not
previously compiling.
Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Device memories are used for PCI devices which have their own pools of
backing store memory such as amdgpu device. The check for an address
being in device memory previously did not handle multiple interleaved
memory devices with the same address range. Therefore, the device memory
check would fail if the interleaving masks did not match. This updates
the method to iterate through all device memories that handle the
RequestorID and returns true if any of the device memories contain the
packet address.
Change-Id: I9339d39c1cb54a5b9075c4a122c118fe61dc6fdb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46381
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
As part of recent decisions regarding namespace
naming conventions, all namespaces will be changed
to snake case.
::Stats became ::statistics.
"statistics" was chosen over "stats" to avoid generating
conflicts with the already existing variables (there are
way too many "stats" in the codebase), which would make
this patch even more disturbing for the users.
Change-Id: If877b12d7dac356f86e3b3d941bf7558a4fd8719
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45421
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The GPU VIPER TCC protocol accidentally used "TiggerMsg" instead
of "TriggerMsg" for the triggerQueue_in port. This was a benign
bug beacuse the msg type is not used in the in_port implementation
but still makes the SLICC harder to understand, so fixing it is
worthwhile.
Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44905
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The Bridge SimObject, which hosts the CDC and SerDes functionality,
could be woken up out-of-order when changing frequencies in DVFS.
This change ensures there is no packet drop during the transition
times.
Change-Id: I40240dcb3e957217abd2d7ad22cc4f78809f7b49
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44287
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Flitbuffers act as FIFOs for internal links and output queues
in routers. This change replaces the use of vectors with deque
for performance improvements. The older implementation of using
a vector combined with a heap sort was both incorrect and
inefficient.
Incorrect because flit buffers should act strictly
as FIFO, sorting them based on time changes the order which affects
functionality in the case of DVFS enabled NoCs.
Change-Id: Ieba40f85628b7c7255e86792d40b8ce3d7ac34b5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44286
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The switch allocator implements a two step separable allocator and
utilizes port winner and vc winner data structures for functionality.
This improves the data structures used and their operations to improve
overall performance of the simulation. We start with an invalid output
port(-1) and an invalid vc(-1) and then allocate the outports
and vcs to inport.
Change-Id: I38b70ebdc1a54b8f748c2a5d510814bf139b9eaa
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44285
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>