If we create abstract memories with a sub-page size on a system with
shared backstore, the offset of next mmap might become non-page-align
and cause an invalid argument error.
In this CL, we always upscale the range size to multiple of page before
updating the offset, so the offset is always on page boundary.
Change-Id: I3a6adf312f2cb5a09ee6a24a87adc62b630eac66
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58289
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Boris Shingarov <shingarov@labware.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This makes what are configuration and what are internal SCons variables
explicit and separate, and makes it unnecessary to call out what
variables to export to C++.
These variables will also be plumbed into and out of kconfiglib in later
changes.
Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add an utility class that provides a service for another process
query and get the fd of the corresponding region in gem5's physmem.
Basically, the service works in this way:
1. client connect to the unix socket created by a SharedMemoryServer
2. client send a request {start, end} to gem5
3. the server locates the corresponding shared memory
4. gem5 response {offset} and pass {fd} in ancillary data
mmap fd at offset will provide the client the view into the physical
memory of the request range.
Change-Id: I9d42fd8a41fc28dcfebb45dec10bc9ebb8e21d11
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57729
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Boris Shingarov <shingarov@labware.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Presumably, these are fixed for whatever protocol that gets selected. We
don't need to accumulate includes, we need to set includes to something
in particular. If there is a common include which always needs to be
used, we can handle that in the SConscript separately from
SLICC_INCLUDES.
Change-Id: I996d08566944e38e388dc287f644c40366ebba0d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56754
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Add a new option `auto_unlink_shared_backstore` to System so it will
remove the shared backstore used in physical memories when the System is
getting destructed. This will prevent unintended memory leak.
If the shared memory is designed to live through multiple round of
simulations, you may set the option to false to prevent the removal.
Test: Run a simulation with shared_backstore set, and see whether there
is anything left in /dev/shm/ after simulation ends.
Change-Id: I0267b643bd24e62cb7571674fe98f831c13a586d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57469
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
dGPUs can translate a virtual address and will not know if the address
resides in system/host memory or device/dGPU memory until the
translation is complete. In order to mark requests as going to either
system memory or device memory we add a field to the Request class.
Change-Id: Ib1e80e8d03ecdfeb11c24d979ccc4b912ce07f91
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51852
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
This enhances MOESI_AMD_Base-dir DmaWrite to enable partial writes. This
is currently done by assuming a full cache line, invalidating caches,
and transitioning back to unblocked state. The enhanced write supports
partial writes (i.e., smaller than cache line size) by first reading
memory, merging the modified data, and then writing back to memory.
Implementation of this mirrors that of DmaRead in terms of state. This
means for each DmaRead state (BDR_PM, BDR_Pm, and BDR_M) there is a
write analogue (BDW_PM, BDW_Pm, and BDR_M) and the BDR_P state is
removed. Furthermore, this enhanced DmaWrite ... actually writes data to
memory instead of relying on DirectoryEntry / backing store for correct
data.
There are two possible state transitions for DmaWrite now. (1) Memory
data arrives before probe response and (2) probe response arrives before
memory data. In case (1), probe data overwrites memory data and merges
the partial write using the TBE write mask then updates write mask to
'filled' state. In case (2), probe data is merged with the partial data
using the TBE write mask then updates write mask to 'filled' state. The
memory data will then be clobbered by copying the TBE data over the
response since the write mask is now full.
Change-Id: I1eebb882b464c4c5ee5fd60932fd38d271ada4d7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57410
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This protocol is using an old style where read/writes to memory were
being done by writing to a DataBlock in a DirectoryMemory entry. This
results in having multiple copies of memory, leads to stale copies in at
least one memory (usually DRAM), and require --access-backing-store in
most cases to work properly. This changeset removes all references to
getDirectoryEntry(...).DataBlk and instead forwards those reads and
writes to DRAM always.
Change-Id: If2e52151789ad82c7b55c8fa2b41c1f4e5b65994
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57409
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously, all abstract memory backed by the same physical memory will
use the exact same chunk of shared memory if sharedBackstore is set. It
means that all abstract memories, despite setting to a different range,
will still be map to the same chunk of memory.
As a result, setting the sharedBackstore not only allows our host system
to share gem5 memory, it also enforces multiple gem5 memories to share
the same content. Which will significantly affect the simulation result.
Furthermore, the actual size of the shared memory will be determined by
the last backingStore created. If the last one is unfortunately smaller
than any previous backingStore, this may invalid previous mapped region
and cause a SIGBUS upon access (on linux).
In this CL, we put all backingStores of those abstract memories side by
side instead of stacking them all together. So the behavior of abstract
memories will be kept consistent whether the sharedBackstore is set or
not, yet presist the ability to access those memories from host.
Change-Id: Ic4ec25c99fe72744afaa2dfbb48cd0d65230e9a8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57369
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
It's possible to bridge together the memory interconnect of two
systems, either as parallel peers, or one nested inside the other. Each
System will have its own set of RequestorIDs, and using an ID from one
System inside the other can lead to a number of different problems.
This change adds a new SimObject called SysBridge which connects two
Systems interconnect together. The object allocates a requestor ID in
each system, and for all PacketPtrs passing through it, the requestor
ID from the target system is installed in the associated Request. On
the way back, either inline or in a split, delayed response, the
original RequestorID is restored by reinstalling the original Request
object.
Change-Id: I237c668962a04ef6dfc872df16762a884c05ede9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/54743
Reviewed-by: Jesse Pai <jessepai@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Some ISAs implement TLB invalidation across multiple cores (TLB
shootdown) by broadcasting invalidation messages to every PE in a
target shareability domain.
These messages originate by specific instructions and can be
cathegorized in two macro groups
1) TLB Invalidation instructions: generating the invalidation
request
Example:
* Arm: TLBI instruction [1]
* AMD64: INVLPGB instruction [2]
2) TLB Invalidation sync instructions: serialization point, ensuring
completion of outstanding invalidation requests
Example:
* Arm: DSB instruction [1]
* AMD64: TLBSYNC instruction [2]
This patch is introducing TLBI and SYNC operations in the memory
subsystem by adding the following Request flags:
* TLBI (1)
* TLBI_SYNC (2)
JIRA: https://gem5.atlassian.net/browse/GEM5-1097
[1]: https://developer.arm.com/documentation/ddi0487/gb/
[2]: https://www.amd.com/system/files/TechDocs/24594.pdf
Change-Id: Ib5b025d0f6bc0edaf4f11a66593947a72ba32b8f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56596
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This protocol is using an old style where read/writes to memory were
being done by writing to a DataBlock in a DirectoryMemory entry. This
results in having multiple copies of memory, leads to stale copies in at
least one memory (usually DRAM), and require --access-backing-store in
most cases to work properly. This changeset removes all references to
getDirectoryEntry(...).DataBlk and instead forwards those reads and
writes to DRAM always.
This results in new transient states BL_WM, BDW_WM, and B_WM which are
blocked states waiting on memory acks indicating a write request is
complete. The appropriate transitions are updates to move to these new
states and stall states are updated to include them. DMA write ACK is
also moved to when the request is sent to memory, rather than when the
request is received.
Change-Id: Ic5bd6a8a8881d7df782e0f7eed8be9d873610e04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56446
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The directory has an assert that this is at least one destination for a
probe when sending an invalidation or shared probe to coherence end
points in the protocol (TCC, LLC). This is not necessarily request and
for certain configurations there will be no probes required and none
will be sent. One such configuration is the GPU protocol tester which
would not require a probe to the CPU if it does not exist.
To fix this we first collect the probe destinations. Then we check if
any destinations exist. If so, we send the probe message. Otherwise we
immediately enqueue a probe complete message to the trigger queue. This
reorganization prevents messages with no destinations from being
enqueued, meeting the criteria for the assertion.
Change-Id: If016f457cb8c9e0277a910ac2c3f315c25b50ce8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55543
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
JIRA: https://gem5.atlassian.net/browse/GEM5-1185
Fixed an issue in which a CleanUnique responder would incorrectly
deallocate the cache block when handling an stale CU when the state
is UD_RU or UC_RU (thus incorrectly transitioning to RU).
The fix is to handle stale CUs similarly to stale WBs where we
override the dataValid TBE field to prevent the wrong state
transition.
This patch moves the stale code path to a separate transition
(similarly to stale WBs/Evicts) and moves the dataValid override to
Initiate_Request_Stale so it applies to all stale request types.
Notice now the stale field is also set on stale Comp_UC responses.
Additional minor change: CheckUpgrade_FromRU is the same as
CheckUpgrade_FromStore so it was removed.
Change-Id: I0a2cedcfde1dc30d67aa2c16d71b7470369c2b6e
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56810
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Meatboy 106 <garbage2collector@gmail.com>
To find a candidate in cache base.cc, function getPacket
is called. In case of multi-prefetchers, we alyways start
from the first prefetcher. Given the default value for "latency"
is 1, there is always a candidate ready for prefech by prefetcher 0.
Hence, we need an arbitration mechansim to cycle through
all prefechers. To make this fair, we added a variable to save what
prefetcher first used to get a packet from, and in the next round,
we start from the next prefetcher to give every prefetcher a chance
to be the first one in a round-robin fashion.
JIRA Ticket: https://gem5.atlassian.net/browse/GEM5-1169
Change-Id: I1c6a267b2bf71764559a080371c1d7f8be95ac71
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56265
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
In SimpleNetwork, switches were assigned an index depending on their
position in params().routers. But switches are also referenced by their
router_id parameter in other locations of the ruby network system (e.g.,
src and dst node parameter in links). If the router_id does not match the
position in SimpleNetwork::m_switches, the network initialization might
fail or implement a different topology from what the user intended. This
patch fixes this issue by storing switches in a map instead of a vector.
Change-Id: I398f950ad404efbf9516ea9bbced598970a2bc24
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55723
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Setting the physical_vnets_channels parameter enables the emulation of
the bandwidth impact of having multiple physical channels for each
virtual network. This is implemented by computing bandwidth in a
per-vnet/channel basis within Throttle objects. The size of the
message buffers are also scaled according to this setting (when buffer
are not unlimited).
The physical_vnets_bandwidth can be used to override the channel width
set for each link and assign different widths for each virtual network.
The --simple-physical-channels option can be used with the generic
configuration scripts to automatically assign a single physical channel
to each virtual network defined in the protocol.
JIRA: https://gem5.atlassian.net/browse/GEM5-920
Change-Id: Ia8c9ec8651405eac8710d3f4d67f637a8054a76b
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41854
Reviewed-by: Meatboy 106 <garbage2collector@gmail.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
The 'max_dequeue_rate' parameter limits the rate at which messages can
be dequeued in a single cycle. When set, 'isReady' returns false if
after max_dequeue_rate is reached.
This can be used to fine tune the performance of cache controllers.
For the record, other ways of achieving a similar effect could be:
1) Modifying the SLICC compiler to limit message consumption in the
generated wakeup() function
2) Set the buffer size to max_dequeue_rate. This can potentially cut the
the expected throughput in half. For instance if a producer can
enqueue every cycle, and a consumer can dequeue every cycle, a
message can only be actually enqueued every two (assuming
buffer_size=1) since the buffer entries available after dequeue
are only visible in the next cycle (even if the consumer executes
before the producer).
JIRA: https://gem5.atlassian.net/browse/GEM5-920
Change-Id: I3a446c7276b80a0e3f409b4fbab0ab65ff5c1f81
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41862
Reviewed-by: Meatboy 106 <garbage2collector@gmail.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>