Previously the LDS bus latency was not configurable in the GPU config
scripts. As a result, the simulations would use the default value from
GPU.py. This commit adds support to change this value as an input
option. The option to use is "--vrf_lm_bus_latency".
Change-Id: I8d8852e6d7b9d03ebec1fe8b392968f396dd3526
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63652
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com>
AddrRangeMap::intersects doesn't support ranges with different
interleavings, thus the current implementation of the destination
seach won't work in cases when different machines map the same address
with different interleaving.
The fixed implementation uses a different AddrRangeMap for each mach
type.
Change-Id: Idd0184da343c46c92a4c86f142938902096c2b1f
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63671
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
In AmbaToTlmBridge we copy the control signal from amba extension to
SystemC extension. This makes gem5 models can proceed the correct
control signals. We need to make the same thing in AmbaToTlmBridge for
fastmodel can proceed the correct control signals.
A practical example is given a request is generated by fastmodel CPU,
translated by gem5 MMU, and routed to a fastmodel target. The secure bit
may be changed by MMU according to the PTE. We need to update the amba
extension in AmbaFromTlmBridge to make the target get the correct
information.
Change-Id: I600be7ba21368f00c05426ac1db3c28efd6ca2ea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63773
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Given a data path initiated from SystemC, routed by gem5, and handled
by SystemC finally.
SystemC -> gem5 -> SystemC
The target SystemC needs to get the original transaction object.
Otherwise, it would lose the extensions in the payload.
To fix the issue, we moves the SenderState class to public for reachibility.
After that, we refactor the logic converting between payload and packet
to make sure they can use the correct instance. Finally, we fix the
potential address change during routing.
Change-Id: Ic6d24e98454f564f7dd6b43ad8c011e1ea7ea433
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63771
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
This part of the Kokoro presubmit tests was designed to ensure gem5
still compiled sucessfully with Clang and to the '.fast' variant. ARM
was chosen arbitarily. Now that ALL exists, it makes more sense to use
it for this test.
Change-Id: Ia3593f7dd506205da13802a69094f4dd7019ab90
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63371
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Modeled after the HiFive Unmatched.
For the cache, we inherited from AbstractClassicCacheHierarchy and
AbstractTwoLevelCacheHierarchy to make a PrivateL1PrivateL2 hierarchy
with the same associativity and sizes as the board. However, the L2
Size is kept as a parameter that can be set by the user.
The core is in-order, therefore we inherited from RISC-V MinorCPU and
used the same pipeline parameters as the ARM HPI CPU, except the
decodeToExecuteForwardDelay, which was increased to 2 to avoid a
PMC access fault.
For the processor, we initialized the core with an ID so that we can
return 4 cores in FS mode, which is the same as the real system,
and 1 in SE mode.
For the memory, we just have a single channel DDR4_2400 with a size of
16 GB and starting at 0x80000000.
For the board, we declared a Boolean variable as a parameter to assign
whether it is in FS mode or not. We inherited from KernelDiskWorkload
and SEBinaryWorkload and initialized the components of the board
according to the Boolean. The other parameters are the clock frequency
and the L2 cache size.
Jira Issue: https://gem5.atlassian.net/browse/GEM5-1257
Change-Id: Ic2c066bb3a41dc2f42566ce971f9a665542f9771
Co-authored-by: Jasjeet Rangi <jasrangi@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63411
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
The KVM_ISA setting was moved into a CONF dict, but the code which
ensured it had a default if there was no possible KVM hosting ISA was
still setting that variable in the base environment dict. This moves
the setting into the CONF dict instead.
Change-Id: I067c969dd761b2cdb098bcba6cd6a4b643d2d427
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63752
Reviewed-by: Earl Ou <shunhsingou@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Finish_CopyBack_Stale is scheduled only when the requestor is the last
sharer. This prevents the cacahe evicting the line which was already
evicted while the stale WriteBack transaction was stalled.
Wrong condition check in Finish_CopyBack_Stale for eviction is also
removed.
Change-Id: Ib66acc1b9e4a6f7cea373e1fb37375427897d48d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63611
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
While the `runtime` module's `get_runtime_isa` function throws a warning
to remind user's the function is deprecated, this was not always helpful
to a user when setting a processor without a target ISA.
This change adds additional warnings to the SimpleSwitchableProcessor
and the SimpleProcessor. These warnings explain not explicitly setting
the ISA via the processor's constructor is deprecated behavior.
Change-Id: I994ad8355e0d1c3f07374bebe2b59106fb04d212
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63331
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Prior to gem5 v21.2, partial translation entries were not cached within
the TLB, therefore Last Level (only) TLBI instructions were invalidating
every entry.
Now that we store translations from several lookup levels we are
currently over-invalidating partial translations. This patch is
adding a boolean flag to TLBIMVAA and TLBIMVA, allowing to discard
a match if the TLBI is targeting complete translations only
and the entry holds a partial translation
Change-Id: I86fa39c962355d9c566ee8aa29bebcd9967c8c57
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62453
Tested-by: kokoro <noreply+kokoro@google.com>
Misaligned access is an optional feature of riscv isa, but it is always
enabled in current gem5 model. We add a flag into the ISA class to turn
off the feature.
Note this CL only consider the load/store instruction, but not the
instruction fetch itself. To support instruction address fault, we'll
need to modify the riscv decoder.
Change-Id: Iec4cba0e4fdcb96ce400deb00cff47e56c6d1e93
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63211
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
The C and C++ standards allows the character type char to be signed or
unsigned, depending on the platform and compiler. Most systems,
including x86 GNU/Linux and Microsoft Windows, use signed char, but
those based on PowerPC and ARM processors typically use unsigned char
This means testing for:
EXPECT_FALSE(parser.parse("255", value));
is not portable as Arm platforms are able to convert 255 into an unsigned
character. We are fixing this portability issue by performing
different checks depending on the platform.
Maybe a better solution would be to explicitly set the sign of the
char (signed char in this case)
Change-Id: I44dd84378ea62ae21a6b03e1f35119bf85f8c799
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63539
Maintainer: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
This change adds a new traffic generator module to the standard
library that can read a .cfg file describing the traffic pattern
as a state machine. It wraps around the TrafficGen SimObject.
In addition it adds a method to ComplexGenerator to set the
traffic from outside using python generators like the example
found in configs/dram/sweep.py.
Change-Id: I5989bb900d26091e6e0e85ea63c741441b72069c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62473
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Add control/branch instructions committed stat to Minor CPU. The stats
can be found in board.processor.cores.core.committedControl_0 in
stats.txt. The stats counted are IsControl, IsDirectControl,
IsIndirectControl, IsCondControl, IsUncondControl, IsCall, and IsReturn.
IsControl tracks the total control/branch instructions committed.
Use inst->staticInst->isControl() flag to determine if an instruction is
a control or not, and then using other flags in the StaticInstFlags to
determine the type of control instruction and tracking the committed
ones.
Jira Issue: https://gem5.atlassian.net/browse/GEM5-1283
Change-Id: Iee1010fdf0fa4078ebe1c56b437295abdb5f4469
Co-authored-by: Kunal Pai <kunpai@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63358
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: ZHENGRONG WANG <seanyukigeek@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
The current MI_example protocol's L1 caches updates the MRU information twice per store requests that miss -- once when the request reaches Ruby and once when the store miss is returned from another level of the memory hierarchy.
Although this approach does not cause any correctness bugs for replacement policies like LRU since this request is the LRU in both cases, it does not work correctly for other policies like SecondChance and LFU, where updating the information twice (for misses) causes them to devolve to LRU.
Note that this was not directly a problem with Ruby previously, because it only supported LRU-based policies that were unaffected by this. However, with the integration of 20879 Ruby now uses the same replacement policies as Classic (which has additional, non-LRU based replacement policies).
This patch resolves this problem by not updating the MRU information a second time for the misses. It has been tested and validated with the replacement policy tests in 20880, and it modifies the store instead of the load in 62232.
Change-Id: I8436e3e537da0ee5841c59a94fa5e5c30105529f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63191
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
When debugging strange addresses, it is extremely useful to know *what*
instruction calculated that address. This make it much easier to follow
assembly code backwards to find the source of an incorrect address.
This change adds a DPRINTF for GPUTLB that by default prints the
disassembly when a virtual address translation is sent to the TLB.
Change-Id: I5066c064a48c5c48696863eeccd8d011245ef7b2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63176
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The previous print statement was not clear that a scalar offset was
being used when printing disassembly, which made it slightly more
difficult to track down bugs related to this (relatively) rare usage of
global load/store instructions.
This change improves the disassembly to closer match the output of
hipcc's assembly code output.
Change-Id: I8514aedacb5b1db93d0586c408c4cf1ce77a7db3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63175
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
SDMA atomic packets are used in conjunction with RLC queues in SDMA for
synchronization similar to how HSA signals are used with BLIT kernels
when SDMA is disabled. Implement a skeleton of the SDMA atomic packet
methods as well as the atomic add64 operation.
The atomic add operation appears to be the only operation used in ROCm,
so this implementation is fairly complete. See:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/
rocm-4.2.x/src/core/runtime/amd_blit_sdma.cpp#L880
Change-Id: I62cc337f2ffe590bdb947b48053760ee8b3a6f32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63174
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There can only ever be two RLC queues maximum. Use this information for
a simpler data structure to store doorbell information. The patch
changes the std::unordered_map previously used to std::array. This will
also be useful in avoiding erase-while-iterating issues needed to
unregister all queues at once.
Change-Id: I95600e40de51cb1a992a20bcebaf7580ea4d0be8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63172
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Previously framebuffer reads would try reading from MMIO trace, special
addresses, and then anything previously written to a special address
range. This does not handle direct large BAR reads, causing incorrect
results in some applications that were doing this.
Rework the readFramebuffer method to do the following. Remove the MMIO
trace read altogether, as there were not any framebuffer reads from the
trace to begin with. Read special addresses first to avoid overwriting
by previous writes. Next read previous writes to special ranges. The
special range is the GART table. These are required for functional
translations. Lastly read from the device memory directly. This does a
functional read required by the PCIDevice read method which is
non-timing. Reading from device memory is preferred over the map type
used for GART to avoid duplication of a potentially huge amount of data.
With this changeset all but one of the HIP samples and HIP examples
applications now run and pass verification of results.
Change-Id: Id3b788bfc5eaf17cfa1897f25d26f3725d4db321
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63171
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>