The latest version of DRAMSys required several API changes which were
applied in this commit.
Also, the README for the usage of DRAMSys has been updated.
The updated version fixes a bug in DRAMSys that caused some full-system
simulations to loop endlessly.
GitHub Issue: https://github.com/gem5/gem5/issues/1452
Currently the RubySystem pointer is set when set_tbe is performed, which
effectively clears the NetDest objects from the TBE (if any). This is
fine if the TBE has been just allocated before set_tbe is called (no
NetDest info in the TBE). However, the CHI protocol has an action
(RestoreFromHazard) that performs a set_tbe over a TBE that had already
been set, i.e., it already has valid NetDest data.
This patch sets the RubySystem pointer when the TBE is allocated, which
is more natural and follows the style already adopted in the
PerfectCacheMemory class (#1864).
Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
Classes CHI_RNF and CHI_MN can be specialized to override base
class/subclass attributes, like it happens in CustomMesh with
router_list (see configs/example/noc_config/2x4.py). To avoid missing
these attributes, it is needed to generalize the class types when
instantiating the objects in the recently added generators.
This PR adds documentation to the standard library using Sphinx. For
details on how the documentation was generated, refer to
https://gem5.atlassian.net/browse/GEM5-1314. Currently, some modules
like `dramsys` and `mesi_three_level` appear as blank pages. To view the
current state of the documentation locally, run: `cd docs/_build/html;
python3 -m http.server 8000`
---------
Co-authored-by: ivanaamit <ivanamit91@gmail.com>
MOESI_CMP_directory protocol crashes with one of the several assertions
in NetDest.cc. It happens because the entry type used to instantiate a
PerfectCacheMemory object in MOESI_CMP_directory-L2cache.sm contains a
NetDest object, so it requires a RubySystem object to be manually set
for it.
Instead of just receiving the block size, change PerfectCacheMemory to
receive a RubySystem object and use it to set the block size and call
ENTRY::setRubySystem if the entries require it.
After the support for multiple ruby protocols was added, the macros
PROTOCOL_MESI_Two_Level and PROTOCOL_MESI_Three_Level were removed.
These macros are still being used to determine if Load_Linked requests
are sent to the protocol, an information required by the fix that
addresses LL/SC livelock.
Replace the macros with a new option: useSecondaryLoadLinked.
The transition that happens when TCC acknowledges TCP of an atomic
operation completion does not move the cacheline state from A to I. This
commit fixes the transition and moves the state to I
When the cache is performing an atomics and receives data, it performs.
pa_performAtomic. This action peeks into the coreRequest queue to check
the messaage type. This queue, however, is already dequeued in the
transition that precedes the one that contains pa_performAtomic. When
pa_performAtomic is called, the simulation crashes. This commit fixes
the crash by using the TBE entry information instead of peeking when TBE
entry exists, and peeking when it doesn't
Previously, a warning would be printed when the WPRI bits
in the senvcfg register were written to. The other registers
do not print warnings for this, so the warning is being
removed.
This commit adds the senvcfg CSR, which fixes the 6.11.3 kernel
crash documented in issue 1674. I have not added a bitfield and
its implementation in isa.cc only uses setMiscRegNoEffect, so
this implementation is likely missing some critical components.
The previous #484 issue reported a bug where the TLB stats on RISC-V
were incremented twice on misses by calling the `lookup` function twice
with hidden argument set to `false`. The fix is only applied on atomic
mode as the `translation` argument of `doTranslate` will not be
`nullptr` in timing mode.
In that case, if the TLB lookup miss, the `doTranslate` function will
start the walker and then return without doing anything more. Then
later, when the pagetable walker found the corresponding PTE, it will
insert it and call `translateWithTLB`. This function then call `lookup`
again which will hit in any case (and crash if not due to the following
assert), but the hit count is incremented here too.
This commit fix by setting the `hidden` argument of `lookup` to true.
GCC and CLANG have different annotations for declaring code should not
be optimized. Adding GEM5_NO_OPTIMZE provides gem5 developers a MACRO
that works in both cases.
This change replaces the GCC pragmas in vfp.hh with GEM5_NO_OPTIMIZE
as this solution didn't work with clang.
In MI_example, when in MI state the block "Maybe_Stale" as in this
controller may have the most up to date value or it could be in the
network. For MII it is guaranteed that this controller has the most up
to date value because it received a PUTX_NACK.
This fixes one of the daily test failures.
- In the new MultiRuby system, the generated ProtocolInfo header files were not being correctly added to the build targets in SCons.
- As a result, when building gem5 with the --duplicate-sources option, these files were mistakenly deleted by SCons.
This happened because SCons treated them as source files instead of generated build targets.
- This commit ensures that the ProtocolInfo header files are explicitly included in the build targets, preventing their unintended removal and fixing the build issue.
The test `ruby_test_test-ALL-x86_64-opt-MatchStdout` is currently
failing because the reference file doesn't match the actual output. This
PR changes the reference file to match.
Fix#1809. Shift the mmap end to a lower address in case the process has
a large max stack size, to avoid overlapping the stack with the mmap
memory range.
Change-Id: Idae343dbbe851a7510463ff141c03f1847e36328
Previous PR #1758 implements the generic getValidAddr to get pure
vaddr without any tags or sign-extend bits.
In RISC-V implementation, the getValidAddr will zero-extend
address in RV32 mode and use it to do TLB translation. Use
getValidAddr to get zero-extend vaddr can reduce zero-extend
repetition
Change-Id: I2273ce48bccb873790103ba0fcdb0b48de9ced4c
src/cpu/simple/probes/LooppointAnalysis.py:
- remove default values for bb_valid_addr_range and
marker_valid_addr_range
- add more comments to explain parameter behaviors
- add citation to the LoopPoint paper
src/cpu/simple/probes/looppoint_analysis.cc:
- fix the incorrect styles
- remove updateBackwardBranch() function call
- match the style of checking if listeners vector is empty
- change the way of stopListening() to remove the listeners through the
manager instead of through the ProbeListener object's destructor.
src/cpu/simple/probes/looppoint_analysis.hh:
- removed backwardBranchPC and use the backwardBranchCounter to replace
its functionaility. Therefore, also removed updateBackwardBranch
function.
Change-Id: Id2430e2f04e61f72d5c4f1aad5cfd4d24a0fbc45
Add comments to most variables and functions.
Change the naming of some variables and functions to improve the
clearness.
Change-Id: Idb557ec84698b4344ed4683f5de87b1a3c2fd66d
In looppoint_analysis.hh, added LooppointAnalysis and
LooppointAnalysisManager classes.
Added all functions and variables for the classes.
Comments needed.
Change-Id: Ia7425b672ef092a68c99b702136850bfa1fcf0a2
Because the LoopPoint analysis will be done with ATOMIC CPU, so all
files related to the LoopPoint analysis object will be under
/src/cpu/simple/probes.
Change-Id: Icbdb0742b712a23dc8f6a19f4c1c827a1f5bf288
This change updates the names for the GPU children in a better way than
overriding the parent. Now it looks something like
```text
board.gpus.shader.CUs00
board.gpus.gpu_caches.ruby_gpu.controllers02
board.gpus.memory.mem_ctrl0
```
Note that it is "gpus" with an "s" because the board accepts more than 1
GPU, optionally.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This function returns the GPUs (for now, possibly other devices in the
future). It needs to be in the abstract board so the GPU-specific cache
hierarchies can be used with non-GPU boards.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This prepends loading the GPU drivers to anything passed in via the
readfile_contents. Note that if the user sets a specific readfile via a
file they will be responsible for loading the driver
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This change separates the instantiation of gpu memory from
instantiatiing the gpu cache. Prior to this change, the gpu
cache instantiated the memories for the gpu by receiving number
of channels as a parameter. With this change, the gpu memory
should be constructed outside the gpu, without being added as a
child to any other object, and passed to the constructor of
the gpu.
This change adds a new method to AbstractMemorySystem to allow
getting its objects of the class MemInterface. This is useful
when certain other classes require a list of MemInterface objects
to create physical memory. In addition, ChanneledMemory and
HighBandwidthMemory implement this function.
Adds GPU_VIPER protocol related caches to stdlib components: CorePair
cache, TCP, SQC, TCC, Directory, and DMA controllers. Adds GPU related
components in a new components/devices/gpus/ directory. Adds prebuilt
GPU and CPU cache hierarchies, GPU and CPU network classes, and a board
overriding the X86Board to provide helper methods for disk image root,
the complex kernel parameter list, and method to provide functionality
to the current GPUFS scripts to load in applications and handle loading
the GPU driver.
The new GPU components can be used as follows:
- Create a GPU device *before* the CPU cache hierarchy is created.
- Add the GPU's CPU-side DMA controllers to the list of CPU cache
controllers.
- Use GPU device method to connect to an AbstractBoard.
Each GPU components has it's own RubySystem, PCI device ID, and address
ranges for VBIOS and legacy PCI BARs. Therefore, in theory, multiple
GPUs can be created. This requires PR #1453 .
An example of using this board is added to configs/example/gem5_library
under x86-mi300x-gpu.py. It is designed to work with the disk image,
kernel, and applications provided in the gem5-resources repository.
Change-Id: Ie65ffcfee5e311d9492de935d6d0631260645cd3
This commit is adding two python files:
* ruby_mem_test.py is the canonical gem5 configuration script,
and it is an adaptation of the existing ruby_mem_test.py [1].
The main difference is the use of the TlmController as a
cache controller, and the use of TlmGenerator instead of
the MemTest memory tester. The config is minimally setting up
the system. The extent of the testing is specified in the second
python file:
* read_shared_unit.py: "unit-test" for the CHI ReadShared request
The file should be passed to the ruby_mem_test.py as cmdline
argument:
build/ARM/gem5.opt <>/ruby_mem_test.py <>/read_shared_unit.py
This is a simple testing file. We should ideally generate separate
test files for separate transactions/scenarios.
The test file can have whatever for inside, it only needs to comply
to the minimal interface required by the ruby_mem_test.py, which is
to define the following function:
def test_all(generator):
[1]: https://github.com/gem5/gem5/blob/stable/\
configs/example/ruby_mem_test.py
Change-Id: I767ede9b8572f3eafe677c84da45fd904d77e319
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit is building over the CHI-TLM wrapping introduced
by the previous commit and it is adding a CHI traffic generator
as a SimObject.
This will get the python objects as input and it will forward
them to the TlmController to convert them into ruby CHI
messages
Change-Id: Ia67094c9bb880e37b24184313df546ecbaa3289f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit is wrapping the external AMBA CHI-TLM with pybind11
so that it will be possible to use its data structures/functions
from python.
More specifically we will be able to instantiate a ARM::CHI::Payload
and ARM::CHI::Phase from a gem5 config, with the end goal of being
able to configure a CHI transaction from python
Change-Id: I9587b445c21df44161fa3d9e09fc2651541b38bd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit is extending the previously defined CHIGenericController
to implement a CacheController which acts as a bridge between the
AMBA TLM 2.0 implementation of CHI [1][2] with the gem5 (ruby) one.
In other words it translates AMBA CHI transactions into ruby
messages (which are then forwarded to the MessageQueues)
and viceversa.
ARM::CHI::Payload, CHIRequestMsg
<--> CHIDataMsg
ARM::CHI::Phase CHIResponseMsg
CHIDataMsg
[1]: https://developer.arm.com/documentation/101459/latest
[2]: https://developer.arm.com/Architectures/AMBA#Downloads
Change-Id: I6f35e7b4ade4d0de1b5e5d2dbf73ce796a9f9fb6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit allows top level configs making use of the Ruby module
to define node generation callbacks.
The config_ruby function will check the system object for two
factory methods
1) _rnf_gen, if defined, will be called to generate RNFs
2) _mn_gen, if defined, will be called to generate MNs
Change-Id: I9daeece646e7cdb2d3bfefa761a9650562f8eb4b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Component implementing a generic controller that allow classic caches
interaction with Ruby/CHI.
The CHIGenericController provides an interface to send/receive CHI
messages to/from the interconnect. This is implement in C++ rather then
SLICC. This controller is seen as a MachineType:Cache by the CHI
implementation in SLICC.
Change-Id: I3afc4363f4290095c2f7428c8487bccd932e0300
This snoop reponse is not generated internally by the SLICC
implementation, but is required for compatibility with classic caches
which may remain in SD state while returning SC data upon receiving
a converted SnpShared.
Change-Id: I5270b29c8863c7afd8abc39b3c7978b95330c183
Vector slide instructions can have the same register group as source and
destination.
Because we are pinning the destination this will provoke an
auto-reference in the dependency graph.
The solution is to use the `vcpy` micro. This way we use the `vtmp`
register group as source and pin the destination without issues.
1. Add missing override to `print` function.
2. Change `TlbEntry` to struct in `ArmISA` class.
This was found attempting to compile gem5 on MacOS (Apple Silicon) with
clang v19.
This commit removes the `protocol=None` argument in various
gem5_verify_config()s because protocol is set to None by
default. Also, two print statements that were taken out in
previous commits were put back in with different function calls.
This commit changes the fs/linux/arm and learning_gem5 tests as
they were previously failing with the Ruby change. The
fs/linux/arm long tests require the addition of a new gem5 build,
ARM_X86, which builds the ARM and X86 ISAs with the
MESI_Two_Level cache hierarchy.
This commit changes MESIThreeLevel, MIExample, and OctopiCache
so they work with this PR. It also adds MESIThreeLevel and
OctopiCache to the testlib tests.
This commit renames NULL_All_Ruby to NULL_ALL_RUBY and adds a
tag for this build for testlib. These changes were made so the
NULL_All_Ruby build could be used with testlib.
This commit adds trusted_stats.json files for CHIL1,
MESI_Three_Level, MIExample, and OctopiCache. These jsons are
used to verify the output of tests from test_memory_traffic_gen.py.
This adds all of the hierarchies supported in the standard library. We
can soon move to using a different build target and run all hierarchies!
Change-Id: Ic065a679ea34c3bb2f71b3b133806d240039fbb5
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This change does many things, but they must all be atomically done.
**USER FACING CHANGE**: The Ruby protocols in Kconfig have changed names
(they are now the same case as the SLICC file names). So, after this
commit, your build configurations need to be updated. You can do so by
running `scons menuconfig <build dir>` and selecting the right ruby
options. Alternatively, if you're using a `build_opts` file, you can run
`scons defconfig build/<ISA> build_opts/<ISA>` which should update your
config correctly.
Detailed changes are described below.
Kconfig changes:
- Kconfig files in ruby now must all be declared in the ruby/Kconfig
file
- All of the protocol names are changed to match their slicc file names
including the case
- A new option is available called "Use multiple protocols" which should
be selected if multiple protocols are selected. This is only used to
set the PROTOCOL variable to "MULTIPLE" when in multiple mode.
- The PROTOCOL variable can now be "MULTIPLE" which means it will be
ignored. If it's not "MULTIPLE" then it holds the "main" protocol,
which is necessary for backwards compatibility with the Ruby.py files.
Ruby config changes:
To make this change backwards compatible with Ruby.py, this change adds
a new "protocol" config called MULTIPLE.py which is used to allow the
user to set a "--protocol" option on the command line. This is only
needed if you are using a gem5 binary with multiple protocols but need
to use Ruby.py.
stdlib changes:
- Make the coherence protocol file behave like the ISA file
- Add a function to get the coherence protocol from the `CacheHierarchy`
like we do with the ISA in the `Processor`.
- Use this function where `get_runtime_coherence_protocol` was used
- Update the requires code to work with the ne CoherenceProtocol
- Fix a typo in the AMD Hammer name and also add the missing MSI
protocol
Scons changes:
- In Ruby we now gather up all of the protocols and build them all if
there are multiple protocols
- There's some bending over backwards to tell the user if they are using
an out of date gem5.build/config file and how to update it
- Note that multiple ruby protocols adds a significant amount of time to
the build since we have to run slicc twice for each file.
build_opts:
- Update all files with new names
- Add a new NULL_All_Ruby that will be used for testing
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This removes two #defines: PARTIAL_FUNC_READS and PROTOCOL_<protocol>.
Instead, update the code to use the runtime information about which
protocol we are using.
Change-Id: Icb6f10fc2d3fd59128c62f9f6e37b52ef2581b61
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Add a ProtocolInfo class that is specialized (through inheritance) for
each protocol. This class currently has the protocol's name and any
protocol-specific options (partial_func_reads is the only one so far).
Note that the SLICC language has been updated so that you can specify
the options in the `protocol` statement in the `.slicc` file.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Update the MOESI_AMD_Base and GPU_VIPER configuration files with the new
full protocol-specific names for the controllers instead of the
deprecated names.
Note: If you have any files which use the `CntrlBase` base, you will
likely need to update the class names that you are inheriting from.
Change-Id: I623fea7dd4cd151f7b15fe7cb43f8a4c45492d89
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Update the standard library Ruby protocols to use the protocol-specific
class names instead of the deprecated general names.
Unfortunately, some code became duplicated between similar controllers.
I tried multiple inheritance, but it didn't work out for me. I think the
correct solution is to move some of the shared code down into the
generated python. That's out of the scope for these changes.
Change-Id: I3444bee3c2917dcbe92b600b85e60244129aad35
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Use the protocol-specific controller names in CHI.
**Important**: This could change some scripts. As long as people use
CHI_config (likely), this shouldn't be a problem, but if you have a
different version of CHI_config.py locally, you will need to make the
following updates:
`Cache_Controller` -> `CHI_Cache_Controller`
`Memory_Controller` -> `CHI_Memory_Controller`
Website updates coming soon!
Change-Id: I7afdcede884ac5f9a9a76cc3d3dd35941e4e2faa
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Use protocol-specific names in Learning gem5 configs. Now, we should no
longer use the generic names for the controllers (it's deprecated). This
updates Learning gem5.
Website changes coming soon. (Hopefull before I push this...)
Change-Id: I18fc5b8bb0fef7c3b8b5cea8de4f73fc0f66a1b3
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This change is a step toward multiple protocols building at the same
time in scons. Add functions and use lists instead of single protocol.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This change makes it so that the MachineType.cc/hh file are not unique
for each protocol. All of the machine types are now tracked.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Rename all SLICC generated SimObjects to have the protocol in their
name. This will allow for two different protocols to have the same
machine names (e.g., L1Cache). For compatiblity, we check to see if the
current or main protocol that is built matches the SimObject's protocol
and export the backwards-compatible name.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Move the html output to be in a subdirectory with the protocol name.
Change-Id: I1510d2d5a531cc6db74d10a0478c23bc8a836a26
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Move all generated protocol-specific files to a subdirectory with the
protocol's name.
This change also updates SLICC to have separate variables for the
filename, c identifier and python identifier instead of just using
variations of the c identifier.
Change-Id: I62f69a4606b030ee23cb2d96493f3257a6923748
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Wrap all protocol-specific types in `namespace <protocol>`. This will
facilitate compiling multiple protocols into one binary.
There is a one-time hack to the generated `MachineType.cc` file to use
the namespace for the protocol until we generalize the machine types.
Change-Id: I5947e8ac69afe6f7ed257d7c5980ad65e9338acf
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This changes extends SLICC to understand two different kinds of slicc
files: files that are protocol-specific and files that are shared or
included between different protocols.
Each declaration in SLICC can now be shared or not. If it is shared,
then we can take a different action in the code generation (e.g., wrap
in a namespace).
*Developer facing change*
Removes the RubySlicc_interfaces.slicc file from the SLICC includes of
every protocol.
Changes required: If you have a custom protocol, you will need to remove
the line `include "RubySlicc_interfaces.slicc" from your .slicc file.
Change-Id: Ia6c2dafe2b8fe86749a13d17daa885bddd166855
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This extends gem5's version of python enums to support an equal operator
and the hash operator so we can compare two instances of enums and add
these to sets/dicts/etc.
Change-Id: I4a785bf9570a54254ada1db684379ee77e67b192
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Change inst_threshold param to inst_thresholds, which it is now
expecting a list of thresholds instead of one threshold.
Add more getter and setter functions:
addThreshold:
it is for adding new thresholds
getCounter:
it is for getting the current counter
getThresholds:
it returns the list of targeted thresholds
resetThresholds:
it clears all the targeted thresholds
Change-Id: I48d022effe7b315112ac150e6a4eaf5aab41c514
The listener pointer does not get deleted with the removeListener()
function call, so we need to make sure it is deleted in the
ProbeListenerObject.
Change-Id: I370f34651b889c8c00a378743e9c1c09fa1d775e
x86-global-inst-tracker.py:
- change the incorrect use of comment styly
- add more comments about the usage of the script and the purpose of
the script
src/cpu/probes/inst_tracker.cc:
- change the way of stopListening to use the manager function to remove
listeners. If in the future, the ProbeListner object does not call the
manager to remove itself in the destruction, then we should call it
here.
- fix stlying
src/cpu/probes/inst_tracker.hh:
- fix stlying
Change-Id: I6f3d745e15883a8a702593f72f984e0d4cc4c526
The GlobalInstTracker manages the global instruction counter and
responsible for triggering an exit event when the global instruction
counter reaches the defined threshold.
The LocalInstTracker listens to one core's retiredInsts probe point
and updates the GlobalInstTracker every time there is an instruction
committed.
The purpose of this instruction tracker is to raise an instruction
executed exit event with multi-core simulation.
Related discussion can be found:
https://github.com/gem5/gem5/issues/1087
Change-Id: Iab6fec57f14f28e590b035506282130ba8662706
Component that require randomness should not share their randomness
source with other components to avoid simulation noise. For instance,
the branch predictor of one core should not impact the random
cache replacement policy of the cache of another core. This currently
happens as all components share a single random number generator.
This PR provides their own generators to relevant components, although
a couple components still use rand().
Change-Id: I3fb7226111c9194ee457af0f0f2b83f8c7b69d1e
Co-authored-by: Arthur Perais <arthur.perais@univ-grenoble-alpes.fr>
This commit fixes three issues in MESI_Three_Level and MESI_Two_Level
implementations (MEI_Three_Level_HTM might still have issues).
1) Define functional read priorities for the cache controllers which
have states with Maybe_Stale access permission (L1 > L2 > Directory).
2) Fix incorrect access permissions in MESI_Three_Level-L1cache:
* S_IL0 is Read_Only, it is waiting for L0 to acknowledge the
invalidation request before moving to SS, also a Read_Only state.
* E_IL0 is Maybe_Stale, its contents might be valid, since there is a
transition (E_IL0, L0_Ack, EE) with no writeback data.
* M_IL0 is Maybe_Stale, its contents might be valid, since there is a
transition (M_IL0, L0_Ack, MM) with no writeback data.
3) Add missing message types carrying valid data in functional reads:
* INV_DATA is a writeback from L0 to L1.
* DATA is a response to GET_S, but there are scenarios where it might
be the only place with valid data (e.g. during L2 replacement).
Change-Id: Ie44fa317027f9ede272967e7461d337e14355eec
Functional reads can be satisfied by one of the following, in order:
1. Main memory (when the data is not present in the cache hierarchy);
2. Valid data block in cache;
3. Valid data block in coherence message;
4. Valid data block marked as Maybe_Stale;
Number 4 is not handled by the current implementation. A Maybe_Stale
block can be either truly stale or actually valid. When it is stale,
the memory read will be satisfied by either number 2 or number 3. When
it is valid, there will be no coherence message with valid data inside,
and the Maybe_Stale block will transition to a valid state after
receiving some kind of acknowledgement.
The main challenge to handle number 4 is how to know from which
Maybe_Stale block the data should be read from. For instance, in a two
level cache hierarchy, we might have a block marked as Maybe_Stale in
both L1 and L2. In this case, we should prioritize the cache controller
that is closest to the CPU. To define this priority, a new virtual
function 'functionalReadPriority' was added to the AbstractController
class.
Change-Id: I4774cd01aab7bb9ca53694cd9dc4f9416a8e4025
The getValidAddr is the method get virtual address with valid bits. It
is useful to get the correct symbol table via valid virtual address.
For ARM, we have `purifyTaggedAddr` to get the right virtual address.
For RISC-V, we only get lower 32 bits in RV32 mode to get the right
symbol table.
Change-Id: I33ad7bec6e7ea4ec82cb1b3a7f521432c6d735b6
As of PR #977, gem5 has a defined M5OPS_ADDR for arm64, even if it is
constrained to certain conditions. With that change and the arm64 board
KVM support (#725), it seems like interaction with the m5 utility under
arm64 will most commonly occur in KVM -- where instruction-mode does not
work -- and thus address-mode becomes more desirable as the default.
This also makes m5's behavior in arm64 consistent with x86, the only
other architecture that supports address-mode operations.
The PR https://github.com/gem5/gem5/pull/1316 changes the sign-extend
address generation. We also need to sign-extend the address when setting
the PC in enter/leave trap handler
Change-Id: I62d58a26dba0b0c64125fea8ac9376ebf55c4952
The number of register pins for the vector gather instructions was not
calculated correctly because the micro vl was not right. This caused
some micros to rename a new register instead of using a pinned one.
This PR improves the legacy RISC-V FS Linux script in the following
ways:
- Adds an argument to specify the bootloader, to (optionally) use the
`RiscvBootloaderKernelWorkload` class.
- Updates the DTB generation function adding the Chosen node. This
fixes the execution with recent Linux kernels.
- Checks if the `--kernel` required argument is set.
Previously, GPU L2 caches could be configured in either writeback or
writethrough mode when used in an APU. However, in a CPU+dGPU system,
only writethrough worked. This is mainly because in CPU+dGPU system, the
CPU sends either PCI or SDMA requests to transfer data from the GPU
memory to CPU. When L2 cache is configured to be writeback, the dirty
data resides in L2 when CPU transfers data from GPU memory. This leads
to the wrong version being transferred. A similar issue also crops up
when the GPU command processor reads kernel information before kernel
dispatch, only to incorrect data. This PR contains a set of commits that
fix both these issues.
The method is defined as const but the caller will actually modify the
content of the structure directly with the pointer in
BaseRemoteGDB::cmdRegW. The member access in the const method are
actually treated as const and will cause error if we use
reinterpret_cast instead.
Remove the const tag to align the expectation of the virtual method.
One of the perks of the previous TLB storage implementation [1] is that
its custom implementation of LRU exploited temporal locality to speed up
simulation performance
TlbEntry tmp_entry = *entry;
for (int i = idx; i > 0; i--)
table[i] = table[i - 1];
table[0] = tmp_entry;
return &table[0];
In other words the matching entry was placed as the first entry of the
TLB table (table[0], top of LRU stack). In this way a following lookup
would encounter it as the first entry while looping over the TLB table,
therefore massively reducing simulation time when temporal locality is
present
(most of TLB table loops would find a match in the first iteration).
int x = 0;
while (x < size) {
if (table[x].match(lookup_data)) {
With the new implementation we decouple TLB storage from the replacement
policy. The result is a more flexible implementation but with the
drawback of a slower lookup/search. We therefore we need to find another
way to exploit temporal locality. This patch addresses it by caching a
previously matched entry in the TLB table
[1]: https://github.com/gem5/gem5/blob/v24.0.0.0/src/arch/arm/tlb.cc
Change-Id: Id7dedf5411ea6f6724d1e4bdb51635417a6d5363
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The MMU already stores a pointer to the release object, so it can query
it directly to check for PAN instead of relying on the slower HaveExt
helper
Change-Id: Ie3a186aa1d65955cff4a40871bde1ee78aa36ec0
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Two NetDest locally declared variables are using default constructor
instead of constructor with RubySystem pointer. This will cause asserts
when (1) garnet is used or (2) a protocol that uses `broadcast()` is
built.
Fix these two by passing the appropriate RubySystem pointers.
In our usecase, we'd like to intercept some gadgets in some gem5 ports,
and register them to a Python-level collection. The registered name is
the string from C++ constructor argument (portName), and it would be
great if we can access that from Python-level as well. This commit
enable this by exporting a py-binded method to access the portName.
Change-Id: I93398697536f27a52d3a1dd0e658fcb321b9e293
With this PR we replace the TlbEntry storage within the TLB from an
array of entries with a custom hardcoded FA indexing policy and LRU
replacement policy, into the flexible SetAssociative cache.
purifyTaggedAddr is known to be an expensive computation regardless of
the memoization we do, as it sits in the critical path from a host
performance point of view (instruction fetch).
In checkPermissions64 we compute it without really needing the tag
purification. The only place where it is used is to check for
PCAlignment, but the alignment checks the 3LSBs whereas a potential tag
would be stored in the most significant ones
Change-Id: I9f39db658c3575dcbacb5351813ff9bb3775046d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
PR gem5#1453 left some unused variables in the ruby code that triggered
"unused variable" warnings found comiling ALL/gem5.opt to use the CHI
protocol. These have been removed.
In #1453, an `implicit_ctor` option was added for SLICC structures. This
was done to allow statements such as `NetDest tmp;` which now require a
non-default constructor without modifying every protocol. The new
`implicit_ctor` option converts the statement `NetDest tmp;` in SLICC to
`NetDest tmp(<implicit_ctor>);` in C++. This is problematic when doing
something like `NetDest tmp := getMachines(...);` which gets converted
to `NetDest tmp(<implicit_ctor) = getMachines(...);` as the constructor
doesn't return an object. Before #1453 NetDest had a default constructor
so there we no difference between a local variable definition and local
variable assignment.
This commit fixes this issue by checking in the LocalVariableAST if the
local variable is part of an assignment or not. If it is not part of an
assignment, the implicit_ctor is used. Otherwise, the assignment is
printed to the generated code.
Note that this is not done anywhere in the public code but should be
allowed for folks writing their own Ruby protocols who might otherwise
be confused why a simple assignment presents a compile error.
This PR adds support for command line arguments in GPU-FS runs to allow
the user to configure several parts of the GPU. It also increases the
bits per set in the build_opts/VEGA_X86 file to enable GPU-FS
simulations to use 64 directories or more.
This commit addresses the requested changes. An additional
comment is added for clarification, the exception type is
changed, and a few of the error messages have been
modified.
Fix#1384.
MESI_Two_Level and MESI_Three_Level protocols are susceptible to LL/SC
livelocks when simulating boards with high core count.
This fix is based on MOESI_CMP_directory's implementation of locked
states, but tailors the solution to only apply it when a Load-Linked is
initiated.
There are two new states to act as locked states and stall any messages
leading to eviction:
* LLSC_E: equivalent to E state, go to E after timeout.
* LLSC_M: equivalent to M state, go to M after timeout.
The main new event is Load_Linked, which is very similar (in behavior)
to a Store, reusing several transient states. When a controller receives
the exclusive data, it differentiates a Load_Linked from a Store by
checking a new field added to the TBE: 'isLoadLinked'. It triggers a
different event when it is a Load_Linked, which in turn causes the
transition to one of the locked states.
The entire mechanism can be turned off by setting 'use_llsc_lock' to
false, and the amount of time to keep locked is defined by
'llsc_lock_timeout_latency'.
Change-Id: I13f415b6b7890d51d01f23001047d2363467a814
A previous PR mistakenly [1] replaced translateFunctional with
translateAtomic. This commit is reverting that
[1]: https://github.com/gem5/gem5/pull/1697
Change-Id: I945c3fe59cea36732d9f30109b950d4114aa8fad
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR fix the bug of vsetivli frequently flushing the pipeline.
Here are two pictures of the pipeline illustrate this phenomenon.


The vsetivli(0x00013334.0) instruction in the first picture flushes the
pipeline every time it is executed. This is due to vsetivli being
incorrectly flagged as a 'DirectControl' instruction. The branch
predictor cannot predict it correctly.
The second picture is the pipeline after fixing the bug.
Change-Id: I5bede47919c06cea86fa23a81624b502fbdc1159
This commit adds SE mode to RiscvBoard. RiscvDemoBoard has also
been modified as adding SE mode to RiscvBoard made the
overridden functions in RiscvDemoBoard obsolete.
This commit adds SE mode to X86Board. X86DemoBoard was also modified,
as functions that were previously needed to add SE mode to
X86DemoBoard were removed.
This PR adds a RiscvDemoBoard that can be used with both SE and FS
mode.This was tested using the workloads riscv-matrix-multiply-run for
SE and riscv-ubuntu-20.04-boot for FS. Two example config scripts have
also been added.
Add decoder and function of AArch32 VRINTN, VRINTX, VRINTA, VRINTZ,
VRINTM, and VRINTP (Advanced SIMD) instructions. Support both 16-bit and
32-bit variants.
Add vfpFPRint in vfp.hh to perform the behavior of round-to-integer.
Only support A32 encoding.
Change-Id: Icb9b6f71edf16ea14a439e15c480351cd8e1eb88
Fix#1682. Treat LEA as a BigLdStOp. BigLdStOps (as well as other Big*
x86 uops) do not have input dependencies on 32-/64-bit destinations. LEA
will still have input dependencies on 16-bit destinations. (LEA cannot
have an 8-bit destination.)
Change-Id: I5d0678e6bd79bfd6064941a89c6fe290750543c9
Moving the address translation logic outside of the ISA::setMiscReg will
allow it to return and potentially invoke a fault
upon execution of the AT instruction. This change affects AArch64 mode
only
Originally, the debug print for read/write to specific register name
will happen after reg.read() and reg.write(). However, there might be
other debug print or warning inside reg.read(), reg.write() which would
be confusing if this debug log happen after all other debug print inside
reg.read(), reg.write().
Creating this commit to change the order.
Starting with https://github.com/gem5/gem5/pull/1453 , some Ruby
structures require a block size be set
and other require a pointer to the Ruby system. This fixes some cases
which were not covered by the per-checkin tests but seen in daily+
tests. In particular:
- WriteMasks and PerfectCacheMemory must explicitly set a block size.
- NetDest and RubyProxyPort require RubySystem pointer.
- Classes inheriting Message now have a setRubySystem collecting all
objects that need a RubySystem pointer and this should be called in
the constructor of the Message.
This commit makes sure all of these happen. This should fix daily
arm_boot_tests and daily learning_gem5 tests.
At some point 'system' -> 'board' in the stdlib code the replacement
policy tests used. Due to this the output is slightly different meaning
the refs need updated.
This was causing the Daily Tests to fail.
Move AT instructions out of setMiscReg.
Modification includes:
- Add template for AT instructions in misc64.isa.
- Add decoder and execution of AT instruction in aarch64.isa and
data64.isa.
- Add AtOp64 and AtOp64Hub to perform the behavior of AT instructions.
Change-Id: I7e8b802421f7335203edb9f8d748ad8669954b8c
This missing parameter causing the Learning gem5 tests to fail.
**Note:** We need to update the website's learning gem5 examples to
reflect this change.
Modifies union construction in the debug directory so output is more
amenable to alternative compilers. Verified that this change produces
code that builds with clang, gcc, msvc, nvhpc, aocc, icc, openxl, and
cray hpc.
These were the kinds of errors seen in MSVC, which this patch fixes.
```
debug/Decoder.hh(24): error C2461: 'gem5::debug::unions::Decoder': constructor syntax missing formal parameters
debug/Decoder.hh(31): error C7624: Type name 'gem5::debug::unions::Decoder' cannot appear on the right side of a class member access expression
```
This PR addresses comments from #1584
- removed tests using the same binary multiple times. Each binary is
tested once with one graph
- Updated the input sizes as per the comments in the above mentioned PR
This PR adds the SMS prefetcher described in [this
](https://web.eecs.umich.edu/~twenisch/papers/isca06.pdf) paper.
This work was done in collaboration with @Setu-Gupta, and @xmlizhao
On branch sms
Changes to be committed:
modified: src/mem/cache/prefetch/Prefetcher.py
modified: src/mem/cache/prefetch/SConscript
new file: src/mem/cache/prefetch/sms.cc
new file: src/mem/cache/prefetch/sms.hh
Change-Id: I68d3bb6cf07385177d0f776fb958f652cfc41489
With this commit we replace the TlbEntry storage within the TLB from an
array of entries with a custom hardcoded FA indexing policy and LRU
replacement, into the flexible SetAssociative cache.
Change-Id: Ia74ff6962ac8195802b51dcc0caa516965f0ce37
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
KeyType definition is required if we want to store the TlbEntry
within an AssociativeCache. We could add an alias and keep the
Lookup name but this will just create extra confusion
Change-Id: Ib0b7c9529498f0f6f15ddd0e7cf3cec52966e8df
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This is needed for compliance with the AssociativeCache
container. It will call the invalidate method when
invalidating the TLB entry
Change-Id: Idb1bc40b5aea8c475146700c81ab79d9980f745d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
With this commit we record the page sizes of all valid
TLB entries in a TLB.
We update the set conservatively, therefore allowing
false positives but not false negatives.
This information will be used when doing a page size based
lookup. At the moment we don't strictly need it as we
iterate over all TLB entries (the TLB implements a fully
associative cache) and if we find multiple matches, it means
we have stored some partial translations.
The existing logic is prioritizing complete translations
over partial translations and among the latter, late stage
translations over early stage (with the idea to minimize
the number of walks).
The "iterate once over the entire TLB and record all matches"
won't work well when we shift from a fully associative
TLB into a set associative. With the introduction of the
aforementioned set, we can do page size based lookups,
so we can explicitly lookup the TLB for a specific page size
therefore looking into the appropriate set for a match
Change-Id: If77853373792d6a5ec84cf1909ee5eb567f3d0e4
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Some ISAs (like Arm) have moved most of the translation logic into
the MMU and use the TLB simply as translation storage. It makes
sense to use the MMU debug flag for that logic and reduce the
scope of the TLB flag to TLB insertion/hits/misses
Change-Id: I2a164545c711d83d3e87075b0cb5c279eed274c9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This commit is moving some MMU methods definition in the
source file from the header to avoid including tlb.hh
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I8fb1aeccd9c38c48b09583b4dc5d152acd09c3e6
It is a simple enum to distinguish between short and big
descriptors. By moving it away from the ArmFault we can
avoid including fault.hh from mmu.hh
Change-Id: Ib556b577c62f5ea3e4c8c9e0d4560a3e99c96778
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We are implementing derived classes of SignalSinkPort that does some
additional logic after it's triggered (set() invoked by SignalSourcePort
peer), and before executing the callback that a device provides (in
onChange_). The logic is like additional logging, or providing debugging
features. However, set() itself directly calls the onChange_ callback.
Making the set() virtual could provide the flexibility to achieve this
feature.
This demo board is a preset arm board, that can be used to run example
gem5 simulations. This board doesnt simulate any known hardware.
The board will be used to run benchmarks such as gapbs and npb to
collect stats. The plan is to show these stats on the gem5 resources
website to provide more details about the resources.
1. Implement Zcmp(cm.push, cm.pop, cm.popret, cm.popretz, cm.mva01s,
cm.mvsa01) instructions
2. The Zcd instructions overlap the Zcmp and Zcmt instruction. This
option is used to enable/disable Zcd extension, implies enable Zcmp/Zcmt
extension. If Zcd is enable, the Zcmp and Zcmt is disabled. Otherwise,
Zcmp and Zcmt is enabled.
Spec: https://github.com/riscv/riscv-isa-manual/blob/main/src/zc.adoc
- This change updates syntax of constructors of Template Classes from
`class<T>()` to `class()`
- Initializes coherence to 0 in `src/mem/cache_blk.hh`
The above changes are made to solve the errors when compiling gem5 in
gcc 14
This commit modifies X86DemoBoard so it has numbers more similar to that
of RiscvDemoBoard and ArmDemoBoard. It also adds SE mode to
X86DemoBoard. Note that the changes here depend on the changes in PR
1579.
**Note**: This PR was created so @BobbyRBruce could add his commits to
#1600
---------
Co-authored-by: Erin Le <ejle@ucdavis.edu>
There were a number of variables in the arm and x86 decoders that are
static (e.g., the decode cache). It's a bit interesting that this
doesn't cause problems with multiple cores since each core has its own
decoder.
However, this causes segfaults if you run different cores on different
*host* threads. We are experimenting with running gem5 with multiple
host thread (i.e., in parallel), and removing these static variables
resolves the segfault.
This change also adds const to any other static variables to ensure that
they cannot be modified.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
The Zcd instructions overlap the Zcmp and Zcmt instruction
This option is used to enable/disable Zcd extension, implies enable
Zcmp/Zcmt extension. If Zcd is enable, the Zcmp and Zcmt is disabled.
Otherwise, Zcmp and Zcmt is enabled.
Spec: https://github.com/riscv/riscv-isa-manual/blob/main/src/zc.adoc#zc-overview
Change-Id: I3788eb6539e13a210c9946efc43ca1fef4639560
There are two issues related to setting an element in PackedReg where
the element spans multiple dwords. First, the mask value is wrong and is
clobbering both dwords. Second, a portion of the value is shifted out of
the narrower input type.
Fix this by using the correct mask to clear the bits where the value
will be placed and use a larger data type to shift the value into place.
* Deprecates the setting of FS/SE mode via the `Simulator` module.
* Moved the creation of the `Root` object from the `Simulator` to the
board.
* Moved the setting of `sim_quantum` from the `Simulator` to the
processor.
* Allows for easier development of boards which support both SE and FS
mode simulation by moving board setup function calls to occur after the
set_workload function is call which sets a boards stats `is_fs` status.
This pops up in kernel 6.8.0. The device it is trying to write is
currently unknown but does not cause problems ignoring the device,
therefore change the panic to a warning and responding to the request
with the default PCI latency.
Change-Id: I4c1229753a75a94a255d8cfd411ac7311283366b
'ext' is set as a Python source path for gem5, like 'src/python'. It
helps vscode users to have vscode aware of this to better analytics and
reduce warnings (most comminly "unable to resolve import).
'tests' isn't in the Python source path when compiling gem5 but it is
when running `tests/main.py`. Though somewhat unideal as is lets vscode
think files in 'src' can import from files in 'test', adding this helps
vscode Python analytics parse the test files which reduces warnings and
aids in betters navigation of the testing code. This is particularly
helpful given the complexity of the testlib testing infrastructure.
With this patch the pannotia tests now:
1. Download the resources to 'gpu-pannotia' in the
'tests/gem5/resources' directory. This is where other test resources are
store.
2. Download thr USA-road-d.NY.gr dataset from Google cloud bucket in a
decompressed state.
2. Avoid re-download the resources if they are already present on the
host machine.
`edited` is what forces a re-run of our tests when the PR title is
updated and other minor metadata stuff. I believe all changes to the
code are covered by the remainder. `synchronize` is means the PR is
triggered with the when the this PR is from (in this case my forked gem5
repo) is synced with the PR branch here. This covers the vast majority
of cases we care about. `opended` covers for the case where the PR is
created and `ready_for_review` for when something moves out of a draft.
Ruby requires each machine type to have a continuous set of version
numbers starting at 0. We were hiding this from users/developers by
using a Python class variable in the stdlib. Unfortunately, with
multiple ruby systems this doesn't work anymore.
As a stop-gap this change adds "resetting" these versions to the
beginning of `incorporate_caches`. It would be better to fix this in the
C++ code (and assign these numbers in C++ probably via the RubySystem),
but that's a bigger change than is needed right now.
---------
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Add decoder and function of AArch32 VRINTN, VRINTX, VRINTA, VRINTZ,
VRINTM, and VRINTP (Advanced SIMD) instructions. Support both 16-bit and
32-bit variants.
Add vfpFPRint in vfp.hh to perform the behavior of round-to-integer.
Only support A32 encoding.
Change-Id: Icb9b6f71edf16ea14a439e15c480351cd8e1eb88
Add decoder and function of AArch32 VCVTA, VCVTP, VCVTN and VCVTM
instructions. Support both 16-bit and 32-bit variants.
Only support A32 encoding.
Change-Id: I6ece0e1b779f9a7cc9d709894a49a7fdcda28373
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR fixes the bug where simInsts and simOps don't reset when
m5.stats.reset() is called. The stats hostInstRate and hostOpRate are
affected by this change as well, as they depend on simInsts and simOps
respectively.
This is related to issue 1443 linked
[here](https://github.com/gem5/gem5/issues/1443).
This allows vscode to resolve python imported from "src/python".
Warnings regarding these imports are numerous and the issue stops users
of vscode to utilizubg features like navigating the codebase though "Go
to Definition" queries on imported classes/functions.
Previously, whether the board object or the memory_system returned
the memory ports was not consistent in the cache_hierarchies
This commit makes it consistently use the board. Note: the board
is a better place so it can customize the ports (e.g., add I/O
components or other things.
This commit also makes the arm board consistent with the other
boards and removes the specialized `get_mem_ports` that was not
used.
This change allows for the `_setup_memory_range` and `_setup_board`
functions to know if the board is to run a FS or SE workload, thus
allowing for a baord to handle both cases considerably easier than
before. With this change all functions are called after FS or SE
is declared via the `_set_fullsystem` function and thus all can
accomodate for SE and FS workloads.
Where appropriate utilize caching of ALL/gem5.opt or VEGA_X86/gem5.opt.
The cache key is just the date returned by the runner. This is unlikely
the most efficient solution but it is simple and difficulties were
encountered when attempting to create a hash of This solution will do
for now.
The previous https://github.com/gem5/gem5/pull/1617 introduce the CLINT
reset feature. When reset, we changed the mtime to 0 and keep mtimecmp
unchanged by default, we also need to check mtime & mtimecmp regiter to
update the MTI signal. However, the mtime register will be incremented
to 1 by `raiseInterruptPin`.
In the PR, we introduced the interrupt ID for CLINT, the mtime will be
incremented only if received the RTC signal
---------
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
There are several parts to this PR to work towards #1349 .
(1) Make RubySystem::getBlockSizeBytes non-static by providing ways to
access the block size or passing the block size explicitly to classes.
The main changes are:
- DataBlocks must be explicitly allocated. A default ctor still exists
to avoid needing to heavily modify SLICC. The size can be set using a
realloc function, operator=, or copy ctor. This is handled completely
transparently meaning no protocol or config changes are required.
- WriteMask now requires block size to be set. This is also handled
transparently by modifying the SLICC parser to identify WriteMask
types and call setBlockSize().
- AbstractCacheEntry and TBE classes now require block size to be set.
This is handled transparently by modifying the SLICC parser to
identify these classes and call initBlockSize() which calls
setBlockSize() for any DataBlock or WriteMask.
- All AbstractControllers now have a pointer to RubySystem. This is
assigned in SLICC generated code and requires no changes to protocol
or configs.
- The Ruby Message class now requires block size in all constructors.
This is added to the argument list automatically by the SLICC parser.
(2) Relax dependence on common functions in
src/mem/ruby/common/Address.hh
so that RubySystem::getBlockSizeBits is no longer static. Many classes
already have a way to get block size from the previous commit, so they
simply multiple by 8 to get the number of bits. For handling SLICC and
reducing the number of changes, define makeCacheLine, getOffset, etc. in
RubyPort and AbstractController. The only protocol changes required are
to change any "RubySystem::foo()" calls with "m_ruby_system->foo()".
For classes which do not have a way to get access to block size but
still used makeLineAddress, getOffset, etc., the block size must be
passed to that class. This requires some changes to the SimObject
interface for two commonly used classes: DirectoryMemory and
RubyPrefecther, resulting in user-facing API changes
User-facing API changes:
- DirectoryMemory and RubyPrefetcher now require the cache line size as
a non-optional argument.
- RubySequencer SimObjects now require RubySystem as a non-optional
argument.
- TesterThread in the GPU ruby tester now requires the cache line size
as a non-optional argument.
(3) Removes static member variables in RubySystem which control
randomization, cooldown, and warmup. These are mostly used by the Ruby
Network. The network classes are modified to take these former static
variables as parameters which are passed to the corresponding method
(e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object
at all.
Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220
(4) There are two major SLICC generated static methods:
getNumControllers()
on each cache controller which returns the number of controllers created
by the configs at run time and the functions which access this method,
which are MachineType_base_count and MachineType_base_number. These need
to be removed to create multiple RubySystem objects otherwise NetDest,
version value, and other objects are incorrect.
To remove the static requirement, MachineType_base_count and
MachineType_base_number are moved to RubySystem. Any class which needs
to call these methods must now have a pointer to a RubySystem. To enable
that, several changes are made:
- RubyRequest and Message now require a RubySystem pointer in the
constructor. The pointer is passed to fields in the Message class
which require a RubySystem pointer (e.g., NetDest). SLICC is modified
to do this automatically.
- SLICC structures may now optionally take an "implicit constructor"
which can be used to call a non-default constructor for locally
defined variables (e.g., temporary variables within SLICC actions). A
statement such as "NetDest bcast_dest;" in SLICC will implicitly
append a call to the NetDest constructor taking RubySystem, for
example.
- RubySystem gets passed to Ruby network objects (Network, Topology).
There was a bug exposed by a recent PR [1] where until recently the O3
CPU was executing an instruction even if it did not have the required
functional unit in the FU pool.
We are adding the matrix descriptors to the Default FU pool in the O3
cpu so that no panic is encountered upon executing of a matrix
instruction
[1]: https://github.com/gem5/gem5/pull/1516
Change-Id: I04250255a2cbb2ee6f3ef204b62bc2c1ee2d4d2c
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
There was a bug exposed by a recent PR [1] where until recently the O3
CPU was executing an instruction even if it did not have the required
functional unit in the FU pool.
We are adding the crypto descriptors to the Default FU pool in the O3
cpu so that no panic is encountered upon executing of a crypto
instruction
[1]: https://github.com/gem5/gem5/pull/1516
Change-Id: Ifaf2f8e4780dfb8ba825a99a02dd587f011dbd23
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The Daily tests have been failing as the learning-gem5 Ruby test now
exits at tick 9831 instead of tick 9981.
**Note**: The cause of this change is currently unknown. I'm not sure if
this is symptomatic of something bigger but for now I only observe this
bug failure and this patch at least silences the error.
The clone3 syscall, implemented in commit 87e774c, is currently only
handled for x86-64 in gem5 SE mode. Clone3 is employed by modern glibc
versions instead of clone for processes/threads generation (e.g. issue
#1204). This commit enables the clone3 syscall in riscv64 by adding the
corresponding handler call, as well as its arguments struct.
The ROM field was originally intended as a future alternate way to load
VBIOS without the ROM being on the disk image. This code path is never
taken for the devices gem5 supports and there is no gem5 implementation.
Deprecate the rom_binary field for this reason.
Similarly, MMIO traces were only used for Vega10. Deprecate this as
Vega10 is now deprecated. The MMIO trace reader is kept as it may still
be useful in the future. It is still the primary way to handle devies
which have graphics capability. None of the devices supported by gem5
have graphics now that Vega10 is deprecated.
Hashing the `src` directory is too costly, with some runners reaching
timeout. Also, as we only have 10GB of cache it makes sense to have
more course grained caching
The setting of the `sim_quantum` parameter makes considerably more sense
to occur in the Processor. Through the `_pre_instnatiate` functions this
is now possible.
It makes much more sense for the Root Object to be create within the
board and passed where required. Creating it in the Simulator class is
not required.
For this to work the signuature of the `_pre_instantiate` function in
`AbstractBoard` has been updated to return the Root object.
THis is deprecated in favor of the board determining whether the
simulation is FS or SE. Usually this will be contingent on which
`set_workload` funciton has been called. Regardless, it is the board's
responsibility. The user should not need to explicitly declare this any
longer.
The Weekly GPU tests are failing due to a timeout, but I found the
testing timeout was set to 5 hours, and we have been frequently close to
reaching this but have recently changed the test enough to consistently
go over.
The main two things that appear to have caused this are:
~~1. Moving the X86_VEGA compilation into the same step as the running
of the tests.~~ (I take this back, the timeout is per-job, it shouldn't
matter how stuff is deivided among steps in the job. However, keeping it
separate does no harm and merging the two steps did coincide with
failures occurring. I'll play it safe for now_.
2. Reducing the number of threads per GitHub Actions runner, thus
slowing job execution.
In addition, we've added more tests to this weekly GPU suite, though I
don't believe we have got to running these tests yet. The timeout
appears to always have been triggered before this.
This PR increases the timeout to 3 days and moves the compilation into a
separate step.
**Update: Same changes done for Daily tests too as it appears to be the
same problem.
The Weekly GPU tests are failing due to a timeout but I found the testing
timeout was set to 5 hours and we have been frequently close to reaching this
but have recently changes the test enought o consistently go over.
The main two things that appear to have caused this are:
1. Moving the X86_VEGA compilation into the the same step as the running of
the tests.
2. Reducing the number of threads per GitHub Actions runner, thus slowing
job execution.
In addition we've added more tests to this weekly GPU suite though I don't
believe have got to running these tests yet. The timeout appears to
always been triggered before this.
This PR increases the timout to 3 days and moves the compilation into a
seperate step.
Two faults:
1. You can't give description the docker-bake file for single platform
builds. They must be in the Dockerfile..
2. The gpu docker image def in docker-bake.hcl was not overriding the
"common" setttings as previously thought. This was causing builds to
something build the wrong platform and vairous other weird bugs. This
has been fixed in this patch.
Invalidate requests align to system cache line size. This causes
problems if the GPU cache hierarchy's cache line size is different than
the system as the unlaigned requests never return, leading to deadlock
on deferred dispatch.
This commit uses the cache line size from the GPU memory manager and
makes the cache line size there non-optional.
Tested with multiple RubySystems where CPU side was 64B and GPU side was
128B cache lines.
Vega10 is no longer officially supported by ROCm and ROCm is starting to
use some packet types not supported. These were originally kept to allow
users to use older disk images with newer gem5. Going forward the gem5
version and gem5-resources releases will be required to be the same to
prevent lingering old configs.
As a replacement for vega10*.py, mi300.py or mi200.py should be used.
HIP examples, cookbook, and rodinia configs can be replaced with the
standard flow of building / obtaining the GPU application and running
using mi300.py or mi200.py as they do not require any input options and
therefore do not require changes to the disk image.
The clone3 syscall, implemented in commit 87e774c, is currently only
handled for x86-64 in gem5. Clone3 is employed by modern glibc versions
instead of clone for processes/threads generation (e.g. issue #1204).
This commit enables the clone3 syscall in riscv64 by adding the
corresponding handler call, as well as its arguments struct.
FMAXV, FMINV, FMAXNMV, FMINNMV and ADDV instructions perform recursive
reduction. Different reduction methods lie to different result when
handle NaN values.
Reuse the template of `twoRegAcrossInstX`. Add one more option
`recursive` for recursive reduction.
Change-Id: I69e690ce7668baee818542d3ea463f7a5f269a69
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit fixs a bug in the viota instuction.
The two different instructions can be referenced to the same
StaticInstPtr because the decoder behaves as shown in [the section of
the
code](https://github.com/gem5/gem5/blob/stable/src/arch/riscv/decoder.cc#L98-L100).
So every first micro-op should reset the cnt variable in the macro-op.
Change-Id: Id311a05cfed41b01e16fd7256d9baa166aee49da
Co-authored-by: Jack Yung-Chen Lin <jack622@andestech.com>
This commit changes metric units (e.g. kB, MB, and GB) to binary units
(KiB, MiB, GiB) in various files. This PR covers files that were missed
by a previous PR that also made these changes.
This change adds MADT entries to the X86Board. Previously, the kernel in
full-system mode was complaining about a `ACPI BIOS Error (bug): Invalid
table length 0x24 in RSDT/XSDT (20190816/tbutils-291)`. This patch fixes
the invalid length and initializes all the tables correctly.
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
FMAXV, FMINV, FMAXNMV, FMINNMV and ADDV instructions perform recursive
reduction. Different reduction methods lie to different result when
handle NaN values.
Reuse the template of `twoRegAcrossInstX`. Add one more option
`recursive` for recursive reduction.
Change-Id: I69e690ce7668baee818542d3ea463f7a5f269a69
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR is doing a simple refactoring of some partitioning policies. It
moves existing functionalities
within PP methods so that they can be called multiple times throughout
the simulation.
Therefore allowing a dynamic adjustment of the partitioning scheme
- Add `isExternalAbort()` in `AbortFault<T>` to determine external
abort.
- Add `virtual isExternalAbort()` in `ArmFault` so the method can be
used in base class.
- Set iss.ea by `isExternalAbort()`
- Add `isExternalAbort()` in `AbortFault<T>` to determine external abort.
- Add `virtual isExternalAbort()` in `ArmFault` so the method can be
used in base class.
- Set iss.ea by `isExternalAbort()`.
Change-Id: I01c22dc46958ab424b389af96d3c3b6243cbc671
The External Data Abort may not set TranMethod, and it leads to assert
error.
- Make `ArmFault::update` virtual.
- Implement override `update` in `AbortFault<T>` to set TranMethod.
Change-Id: I49e18799df8420b214b6059ffa756a13edf343d5
This will allow gem5 to configure the maximum capacity of a
partition dynamically during simulation, rather than
having it statically defined at construction time
Change-Id: Ib55c9990a6bc2930abaf2438c13337acc643520f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
In this way we actually need to store one unsigned integer instead of
two. We also won't need to recompute the total number of cache blocks
whenever we will adapt this policy to be dynamically modified
Change-Id: Ia8cf906539d1891b6cdb821f2a74628127dc68c6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Add decoder and function of AArch32 VCVTA, VCVTP, VCVTN and VCVTM
instructions. Support both 16-bit and 32-bit variants.
Only support A32 encoding.
Change-Id: I6ece0e1b779f9a7cc9d709894a49a7fdcda28373
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Replace std::uniform_*_distribution by custom code
to make random number generation in gem5 portable across
compilers.
Of note, FP random number generation was not uniformly
distributed, and this PR does not fix that issue.
Thanks to Chandana S. Deshpande (deshpande.s.chandana@gmail.com)
for uncovering the issue.
Co-authored-by: Arthur Perais <arthur.perais@univ-grenoble-alpes.fr>
This refactor attempts to homogenize all riscv's vector (macro/micro)
instruction classes so that ELEN and VLEN are guaranteed to be a class
attribute. Since both are constant, all instructions will get it on the
decoding process passed through to their vector base class.
This allows the removal of VLEN in the PC state and also in some
constructor default parameters (solves issue #1207).
Change-Id: I6f0471004335f49b00b015c37e95dc7f9569e303
Move getRvType & getPrivilegeModeSet static methods into
RiscvISA::RemoteGDB virtual methods allows the derived
RiscvISA::RemoteGDB to override it without change a lot of methods in
base methods
Change-Id: I3cbb9cf1fdee4a298e903bb4a0a5683c042b749d
64kB, in these cases, will cast to 64KiB regardless. To improve
readability and understanding of these objects, this patch changes there
SI Prefix (kB -> KiB).
System(Misc) register accesses are not the only trappable instructions.
We move the exception generation logic (generateTrap) from the
MiscRegOp64 to the base ArmStaticInst
There appears to have been an assumption here that `Popen` would raise
an exception if the command run returned non-zero. This is not the case.
This commit fixes this by obtaining the return code and throwing an
exception if it is non-zero.
This bug caused some minor issues as Exception handling code to handle
the non-zero case elsewhere in Scons was never executed.
This reverts commit 52fbc8ebcf.
This commit used Ubuntu 22.04 instead of the typucal 24.04 as 24.04
has GCC v13 installed by default. GCC v13 (and new compilrs introduce a
'oerloaderdf-virtual' check that is triggered in systemc. Systemc
developers suggest this fix to proceed.
1. Added `sudo` to Ubuntu 24.04 all dependency Dockerfile
Without this an admin user entering a container mirroring host user
permissions can't run `sudo` within the container as it doesn't exist.
They also can't install it as `apt install` requires `sudo`.
As 24.04_all-deps serves as the base images for other images, this
change will be reflected in most other gem5 Docker images.
2. Fix multiplatform builds by removing `BUILDPLATFORM` platform fix.
This actually breaks multi-platform builds when using docker buildx via
the docker-bake.hcl file. Removing this fixes and permits the
multi-platform builds to be built.
3.Remove 'latex/riscv64' as Docker build target
It is unlikely anyone will be running these images on a RISC-V system
anytime soon. They are costly in terms of space and also require RISC-V
emulation to build which is very slow. This change has it so our
multi-platform builds just target ARM and X86.
Without this an admin user entering a container mirroring host user
permissions can't run `sudo` within the container as it doesn't exist.
They also can't install it as `apt install` requires `sudo`.
As 24.04_all-deps serves as the base images for other images, this
change will be reflected in most other gem5 Docker images.
It is unlikely anyone will be running these images on a RISC-V system
anytime soon. They are costly in terms of space and also require
RISC-V emulation to build which is very slow. This change has it so our
multi-platform builds just target ARM and X86.
This actually breaks multi-platform builds when using docker buildx via
the docker-bake.hcl file. Removing this fixes and permits the
multi-platform builds to be built.
This PR is adjusting the constructor to relax template
requirements. In this way child classes are free to provide
their own way of calculating the number of entries and the
shifting required to extract the set
Why do we need this?
Up to this patch we have been configuring the indexing policy
by setting up the cache/table size (in bytes) and the entry size.
Those parameters make a lot of sense in caching structures
where:
a) We want to configure the caching structure using
the amount of storage (in bytes) provided (e.g. 4kB of Cache)
b) the content of a single entry is addressable therefore
we need the entry size to know how many bits in the indexing
process we need to shift to extract the set
In those cases the number of cache entries is derived from the formula
num_entries = size / entry_size
The adoption of the IndexingPolicy for different kinds
of caching structures (e.g. prefetcher tables) make this
way of configuring the IP a bit quirky.
For some tables directly setting the number of entries is a far more
intuitive way of configuring the IP, instead of allocating the desired
number of entries by working things out with the formula above
The Random ser/des support has been non-existent since 2014.
Removing it will enable the Random class to be unit tested
without having a dependency on the src/sim code.
According to the Arm architecture reference manual:
"When the value of HCR_EL2.VM is 1, data cache invalidate instructions
executed at EL1 perform a data cache clean and invalidate"
This behaviour should be exteded to secure mode now that Secure EL2 is
supported
Change-Id: I8b4733e6336a0fd5577f4ef35c0bae5408f91194
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR changes memory and cache sizes in various parts of the gem5
codebase to use binary units (e.g. KiB) instead of metric units (e.g.
kB). This makes the codebase more consistent, as gem5 automatically
converts memory and cache sizes that are in metric units to binary
units.
This PR also adds a warning message to let users know when an
auto-conversion from base 10 to base 2 units occurs.
There were a few places in configs and in the comments of various files
where I didn't change the metric units, as I couldn't figure out where
the parameters with those units were being used.
The current GPU_VIPER protocol's TCC cache update the MRU information
twice with calling a_allocateBlock and ut_updateTag which affects the
LIP and RRIP replacement polies. Remove ut_updateTag fixes the LIP and
RRIP replacement polies.
Change-Id: I79ad9392593e00425a7fe8828048465b2c2c2e1f
The Random ser/des support has been non-existent since 2014.
Removing it will enable the Random class to be unit tested
without having a dependency on the src/sim code.
The current GPU_VIPER protocol's TCC cache update the MRU information
twice with calling a_allocateBlock and ut_updateTag which affectgs the
LIP and RRIP replacement polies. Remove ut_updateTag fixes the LIP and
RRIP replacement polies.
Change-Id: I79ad9392593e00425a7fe8828048465b2c2c2e1f
The current GPU_VIPER protocol's TCC cache update the MRU information
twice with calling a_allocateBlock and ut_updateTag which affectgs the
LIP and RRIP replacement polies. Remove ut_updateTag fixes the LIP and
RRIP replacement polies.
Previously, when passing the -re option while using multisim, the files
simerr.txt and simout.txt would be redirected into the m5out directory
instead of the correct subdirectory. They would also have a name of the
format
Spawn_gem5PoolWorker-some-integer_(simout|simerr).txt, which doesn't
indicate which simulation the files correspond to.
This commit fixes these issues by redirecting simerr.txt and simout.txt
into the correct subdirectory.
Change-Id: I0a25a9fd8dc672949f5f85fc5ca6452529301a73
A Dockerfile must start with the importation of a docker base image. It
is only after this point that `LABEL` be provided. Having `LABEL` at the
top of the Dockerfiles resulted in the Docker images failing to build.
This avoids repeating the same switch construct
Change-Id: Ie16c52519b1e1f984284f2f1344a3903a0010d36
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
System(Misc) register accesses are not the only trappable instructions.
We move the exception generation logic (generateTrap) from the
MiscRegOp64 to the base ArmStaticInst
Change-Id: Ie2ba0c39790f50e3e8d504d153025d402283ec95
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This replaces hardcoded integral values with more explicit constant
names in the code allocating functional units to instructions.
This commit follows ba5886aee7 which
should have read:
"If an instruction requires a functional unit that is not present in the
model (e.g., because it is not present in the configuration), O3CPU
treats it as a 1-cycle operation.
This commit changes the behavior to make the cpu panic when this
happens. The cpu panics only if the instruction reaches the head of the
ROB, meaning it is ok to have unsupported instructions on the wrong
path.
Thanks to Chandana S. Deshpande (deshpande.s.chandana@gmail.com) for
finding the issue."
Change-Id: I5e0a37e5fb8404cb5496bd2cb0a9a5baeae3b895
Co-authored-by: Arthur perais <arthur.perais@univ-grenoble-alpes.fr>
This commit is adjusting the constructor to relax template
requirements. In this way child classes are free to provide
their own way of calculating the number of entries and the
shifting required to extract the set
Why do we need this?
Up to this patch we have been configuring the indexing policy
by setting up the cache/table size (in bytes) and the entry size.
Those parameters make a lot of sense in caching structures
where:
a) We want to configure the caching structure using
the amount of storage (in bytes) provided (e.g. 4kB of Cache)
b) the content of a single entry is addressable therefore
we need the entry size to know how many bits in the indexing
process we need to shift to extract the set
In those cases the number of cache entries is derived from the formula
num_entries = size / entry_size
The adoption of the IndexingPolicy for different kinds
of caching structures (e.g. prefetcher tables) make this
way of configuring the IP a bit quirky.
For some tables directly setting the number of entries is a far more
intuitive way of configuring the IP, instead of allocating the desired
number of entries by working things out with the formula above
Change-Id: Ic7994c129196d6ba83dc99ce397ad43393d35252
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit adds a test that checks that strings representing
base 10 memory sizes or base 10 memory bandwidths are correctly
converted to strings representing base 2 values.
Change-Id: Ie8cac15f06b4ceb1786484fea4e8ba2111f4e8d3
This commit refactors the base 10 to base 2 error message such
that it uses the preexisting _split_suffix function instead
of a new function based off of _split_suffix. This commit also
removes the new helper function used previously.
Change-Id: I44d9ac3d8b98bcff33d6bfea7ffbdb5009272ede
At present, if an instruction requires a functional unit that is not
present in the O3CPU config, O3CPU treats it as a 1-cycle operation that
does not consume an FU. This seems like a silent failure : if I forgot
to add a FU for a new operation type I added, then I don't want it to
silently work "for free".
The problem is that the code treats the FU allocator returning
`NoCapableFU` for a given DynInst as equivalent to the case where the
DynInst obtained an FU, with default latency of 1. This is because there
is a single if statement that checks whether the FU allocator returned
`NoFreeFU` or not, and `NoCapableFU` happens to be different. The change
is to introduce `NoNeedFU` and to panic if the FU allocator returns
`NoCapableFU`
An improvement would be to use a strongly typed enum rather than integer
constants. Thoughts ?
In addition to unit tests, I have tested this with `main.py run` and get
panics if I remove support for `IntMul` type in `O3CPU.py` in:
```
./SuiteUID-asm-riscv-rv32um-ps-mul-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mul-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv32um-ps-mulh-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulh-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv32um-ps-mulhsu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulhsu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv32um-ps-mulhu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv32um-ps-mulhu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mul-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mul-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mulh-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulh-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mulhsu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulhsu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mulhu-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulhu-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-asm-riscv-rv64um-ps-mulw-o3-ALL-x86_64-opt/TestUID-asm-riscv-rv64um-ps-mulw-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-BaseCPUProcessor-arm-hello-ALL-x86_64-opt/TestUID-BaseCPUProcessor-arm-hello-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-cpu_test_ArmDerivO3CPU_Bubblesort-ALL-x86_64-opt/TestUID-cpu_test_ArmDerivO3CPU_Bubblesort-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-cpu_test_ArmDerivO3CPU_FloatMM-ALL-x86_64-opt/TestUID-cpu_test_ArmDerivO3CPU_FloatMM-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-cpu_test_RiscvDerivO3CPU_Bubblesort-ALL-x86_64-opt/TestUID-cpu_test_RiscvDerivO3CPU_Bubblesort-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-cpu_test_RiscvDerivO3CPU_FloatMM-ALL-x86_64-opt/TestUID-cpu_test_RiscvDerivO3CPU_FloatMM-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_arm_boot_test_to-tick-ALL-x86_64-opt/TestUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_arm_boot_test_to-tick-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_riscv-boot-test_to-tick-ALL-x86_64-opt/TestUID-o3-cpu_1-cores_classic_DualChannelDDR3_1600_riscv-boot-test_to-tick-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-arm-hello32-static-o3-ALL-x86_64-opt/TestUID-test-arm-hello32-static-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-arm-hello64-static-o3-ALL-x86_64-opt/TestUID-test-arm-hello64-static-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-mips-hello-o3-ALL-x86_64-opt/TestUID-test-mips-hello-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-riscv-hello-o3-ALL-x86_64-opt/TestUID-test-riscv-hello-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
./SuiteUID-test-riscv-print-this-o3-ALL-x86_64-opt/TestUID-test-riscv-print-this-o3-ALL-x86_64-opt/simerr.txt:src/cpu/o3/inst_queue.cc:905: panic: Processor cannot execute opclass:2
```
Co-authored-by: Arthur perais <arthur.perais@univ-grenoble-alpes.fr>
**Note**: Erin needs to complete the commit by expanding this test to
properly test the behavior of this change.
To run the pyunit tests:
```sh
scons build/ALL/gem5.opt -j`nproc`
./build/ALL/gem5.opt tests/run_pyunit.py
```
Change-Id: I8cea0fe8b088e03e84072a000444953768bc3151
According to the pybind documentation, "When combining *args or **kwargs
with Keyword arguments you should not include py::arg tags for the
py::args and py::kwargs arguments."
In the current implementation of gem5, if you use the cxxMethod
decorator on a function that has *args or **kwargs, gem5 will
incorrectly add these variables to the pybind generated declaration.
I.e., def f(arg1, arg2, *args, **kwargs): -> .def("f", &f,
py::arg("arg1"), py::arg("arg2"), py::arg("*args"), py::arg("**kwargs"))
which is incorrect pybind code.
To fix this problem, we should ignore variables in the generator if they
are *args or **kwargs. This change skips these variables when creating
the pybind declaration.
Change-Id: I44a1e0eb0b5fc5c1e1d423ba145d456bff92c6b8
There is a compiler error with GCC 12 discussed here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115824
This Pybind code triggers the bug and was causing our compiler tests to
fail.
To fix gem5 compilation for gcc 12 these warnings/errors have been
suppressed for this code.
There is a compiler error with GCC 12 discussed here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115824
This Pybind code triggers the bug and was causing our compiler tests to
fail.
To fix gem5 compilation for gcc 12 these warnings/errors have been
suppressed for this code.
This is a copy and paste of:
https://github.com/pybind/pybind11/pull/5355
Change-Id: I9344951ef00d121ea0b609f4faa13dfe09aabb3b
This commit refactors the error message added to convert.py.
A mapping between the base 10 and base 2 suffix magnitudes
(e.g. k: ki, M: Mi, etc.) and a new function that extracts the
magnitude and numerical value have been added. Also, a warning
message has been added to the toMemoryBandwidth function in
addition to the one in toMemorySize.
Change-Id: I3ae157d13c7089d38a34a6e4c35a2b58978106d0
A bug was uncovered in that for various syscalls that used 64bit
parametres, the ABI for 32bit operating systems was passing the wrong
values to the syscalls, due to discrepancies between the target and
guest OS. This commit fixes that by replacing 64-bit types, or types
that are platform specific in size, with the exact correspondent for the
guest OS, thus producing the correct signature for the respective
syscalls. On top of this, the --param argument is added to the
starter_se script, in order to support attachment of remote debuggers.
This PR adds labels to Dockerfiles. The labels are the source
(https://github.com/gem5/gem5), a description, and the license.
Change-Id: I47ce432257641b394efef4958f1474eefe2a11c1
Co-authored-by: Harshil Patel <harshilp2107@gmail.com>
Unmap queues with queue_sel of 2 unmaps all queues while queue_sel of 3
unmaps all non-static queues. The implementation of 3 was actually
correct for 2. Static queues are queues which were mapped using a map
queues packet with a queue_type of 1 or 2.
This commit adds ability to mark a queue as static. When unmap queues
with queue_sel of 2 is sent, the existing code is now executed. With a
value of 3, we now check if the queue was marked static and do not unmap
it if marked.
Change-Id: I87d7cf78a0600c7baa516c01f42c294d3c4e90c5
Unmap queues with queue_sel of 2 unmaps all queues while queue_sel of 3
unmaps all non-static queues. The implementation of 3 was actually
correct for 2. Static queues are queues which were mapped using a map
queues packet with a queue_type of 1 or 2.
This commit adds ability to mark a queue as static. When unmap queues
with queue_sel of 2 is sent, the existing code is now executed. With a
value of 3, we now check if the queue was marked static and do not
unmap it if marked.
Change-Id: I87d7cf78a0600c7baa516c01f42c294d3c4e90c5
LT <= was previously correct while < is not. Can lead to incorrect
program execution. Related to #1366 related to #1520
Change-Id: I00b7838e920eee7c8adb508e869fdf53a9373e1f
The AssociativeCache is used by different caching agents.
This PR will allow to pass the appropriate flag to the cache so that we
can meaningfully debug
its internals. For instance, when used to model a prefetcher table, will
will pass the
HWPrefetch flag; when used to model a TLB, we will pass the TLB flag.
The Vega ISA's s_memtime instruction is used to obtain a cycle value
from the GPU. Previously, this was implemented to obtain the cycle count
when the memtime instruction reached the execute stage of the GPU
pipeline. However, from microbenchmarking we have found that this under
reports the latency for memtime instructions relative to real hardware.
Thus, we changed its behavior to go through the scalar memory pipeline
and obtain a latency value from the the SQC (L1 I$). This mirrors the
suggestion of the AMD Vega ISA manual that s_memtime should be treated
like a s_load_dwordx2.
The default latency was set based on microbenchmarking.
Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420
Explicitly convert to float/double to fix compiler warnings that I have
turned on locally. It might make sense to make use of fplib functions to
be portable across different host float formats but something as simple
as comparison against zero should be safe.
Change-Id: I96c6ee7c5497fece11be07234ff80ff86e7555e2
Programming an event ID while counters are disabled is perfectly fine,
so we should just log this using DPRINTF instead of printing a warn()
every time it happens.
Change-Id: Ib9499857271033ef941f74a7f012d8694328eaf3
I'm not entirely sure what the mandated behaviour is according the the
ARM ARM, but I was very confused by the counters continuing to increment
with the old event even when programmed to an event ID that is not
currently supported by GEM5. Disconnecting the counter if the event is
not supported is less surprising behaviour IMO.
Change-Id: I927d9339c138dafa1484db1515c2aa09b0a9a0a9
This matches the Arm manual and the output produced by capstone. Also
avoid unnecessary spaces in vsel* instruction printing.
Change-Id: I071dd834b7104f10f6358a6b2e2895bdab64df82
We are adding two debug prints in the AssociativeCache:
1) Inserting print
2) Evicting print
Among those, the evicting one is probably the most important
This is because while the DPRINTF can be added in the
Entry::insert implementation (called during insertion),
the AssociativeCache does not reference any evict method.
Instead, the findVictim is transparently invalidating the
victim, which makes it impossible for the client code
to understand whether the victim was a valid
entry or not.
Change-Id: I4fee59cc63c6b0e14c5b02bcf3ba5f58aa21ef9f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This will allow to print debug information for the associative
cache depending on its usage
Change-Id: Ia64e1cd7cb31fcbac27d031c38cd448ea64c5b4d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Add declaration of HAFGRTR_EL2 registers and read/write as GPR.
Change-Id: I87570d1e87d479f4530cf2c6e05931cdc26ee361
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit contains the rest of the base 2 vs base 10 cache/memory
size clarifications. It also changes the warning message to use
warn(). With these changes, the warning message should now no
longer show up during a fresh compilation of gem5.
Change-Id: Ia63f841bdf045b76473437f41548fab27dc19631
This commit adds a warning message for when cache or memory sizes
will be automatically converted from metric units (e.g. kB) to
binary units (e.g. KiB).
Change-Id: I4ddf199ff2f00c78bbcb147e04bd88c496fa16ae
This commit changes files in src/python/gem5 so that memory and
cache sizes use "KiB", "MiB", and "GiB" instead of "KB", etc. This
makes the codebase more consistent, as gem5 will automatically
convert memory and cache sizes that are in metric units (KB) to
binary units (KiB).
Change-Id: If5b1e908ddcff7182b71e789229a3ba1fa6ad1f1
This PR implements #1429. It mainly achieve so with the following
changes
1) The IndexingPolicy is now a templated SimObject to make its APIs work
with different data types.
As an example, look at the getPossibleEntries, which is requiring an
Addr object whereas we want to be able to call the method with different
keys depending on the Tag
2) The AssociativeCache extracts type information from the Entry
template parameter.
This means any AssociativeCache entry will have to define the following
types:
KeyType = This is the data type used for lookups (in its simplest case,
it is Addr)
IndexingPolicy = This is the base indexing policy SimObject
As an example, the PR is also reworking the TaggedEntry to be
AssociativeCache compliant. This
ultimately allows us to remove the weird overloading of cache querying
methods with the secure flag, and to
remove the AssociativeSet which was providing such weird interface.
As mentioned in the [base, mem-cache: Rewrite TaggedEntry
code](7ee9790464)
commit, further cleanup is needed. TaggedEntry
is really a misleading name as its sole difference with the CacheEntry
(which is also tagged) is the presence of
the secure bit. A better name should be chosen.
We don't store a pointer to the indexing policy anymore.
Instead, we register a tag extractor callback when we
construct the TaggedEntry
Change-Id: I79dbc1bc5c5ce90d350e83451f513c05da9f0d61
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We don't store a pointer to the indexing policy anymore.
Instead, we register a tag extractor callback when we
construct the CacheEntry
Change-Id: I06dc58e2f67e01f3f9bcd9f0c641505d3aec82ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
As detailed by a previous commit, AssociativeSet is not needed anymore.
The class is effectively the same as AssociativeCache
Change-Id: I24bfb98fbf0826c0a2ea6ede585576286f093318
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The only difference between the TaggedEntry and the newly defined
CacheEntry is the presence of the secure flag in the first case. The
need to tag a cache entry according to the security bit required the
overloading of the matching methods in the TaggedEntry class to take
security into account (See matchTag [1]), and the persistance after
PR #745 of the AssociativeSet class which is basically identical
to its AssociativeCache superclass, only it overrides its virtual
method to match the tag according to the secure bit as well.
The introduction of the KeyType parameter in the previous commit
will smoothe the differences and help unifying the interface.
Rather than overloading and overriding to account for a different
signature, we embody the difference in the KeyType class. A
CacheEntry will match with KeyType = Addr,
whereas a TaggedEntry will use the following lookup type proposed in this
patch:
struct KeyType {
Addr address;
bool secure;
}
This patch is partly reverting the changes in #745 which were
reimplementing TaggedEntry on top of the CacheEntry. Instead
we keep them separate as the plan is to allow different
entry types with templatization rather than polymorphism.
As a final note, I believe a separate commit will have to
change the naming of our entries; the CacheEntry should
probably be renamed into TaggedEntry and the current TaggedEntry
into something that reflect the presence of the security bit
alongside the traditional address tag
[1]: https://github.com/gem5/gem5/blob/stable/\
src/mem/cache/tags/tagged_entry.hh#L81
Change-Id: Ifc104c8d0c1d64509f612d87b80d442e0764f7ca
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
As long as the AssociativeCache Entry parameter satisfies the
interface it should be fine. We enforce the bare minimum of having
a replaceable entry.
Doing otherwise will restrict our capability to have a generic cache
with generic tags
Change-Id: I23e32b7540fea6b6e5894aca3d91538e81214932
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The KeyType data type is the type of the lookup and the cache extracts
it from the Entry template parameter
Change-Id: I147d7c2503abc11becfeebe6336e7f90989ad4e8
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit is making the AssociativeCache indexing policy
a type extracted from the Entry template parameter
Change-Id: Ic9fb6ccb1b3549aaa250901e91ae3c300b92103e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Exposing the tag of a cache entry through the associative
cache APIs makes it hard to generalize the cache for
structured tags. Ultimately the tag should be a property
of the cache entry and any tag extraction logic (if needed)
should reside there. In this we can reuse the associative
cache for different Entry params, each one bearing a different
representation of a tag
Change-Id: I51b4526be64683614e01d763b1656e5be23a611b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Some compilers (gcc version 12.3.0) will start complaining when perfect
forwarding the StrideEntry argument constructed with an extra parameter
(see later patches).
Using a pointer seems to fix the gcc bug.
The commit is also changing the signature of findTable and allocateContext
so that a reference rather than a pointer is return. In this way we don't
deal with the hack of returning a raw ptr from a unique_ptr
Change-Id: Idd451208aae80bbfae76110c859e93084bcb2635
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The code is already assuming a fully associative cache. Rather than
calling getPossibleEntries with a random value and therefore needlessly
passing a vector of pointers, we use the AssociativeCache iterator to
loop over the cache entries
Change-Id: Ic99cbd39ee9f12eef9091d9d62ca24d0c3e61300
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
- Updated search query so that resources that are not compatible with
the gem5 version are still downloaded and used but a warning is thrown
instead of returning an error.
In `get_default_kernel_root_val()`, now prioiritizes the explicit
disk_device passed from the user over the default implemented by the
board.
Also adjusts syntax for selecting this value in
`set_kernel_disk_workload()` for consistency.
It seems that the common use case for setting `disk_device` is that
there is a mismatch between where the disk image is mounted and where
the board expects it by default. In this case, it also seems common that
the root partition will be on this explicit device as well.
In cases where this is not true, explicit kernel arguments can be used
to define the distinct disk device apart from the root. However, this
seems less common than the above so, in that case, it would be easier to
tie these together.
Add declaration of HAFGRTR_EL2 registers and read/write as GPR.
Change-Id: I87570d1e87d479f4530cf2c6e05931cdc26ee361
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR fixes the issues with the implementation of the Best Offset
Prefetcher described in issue #1402
On branch bop
Changes to be committed:
modified: src/mem/cache/prefetch/bop.cc
modified: src/mem/cache/prefetch/bop.hh
---------
Co-authored-by: Setu Gupta <setu.gupta.2020@gamil.com>
Co-authored-by: Abhishek Shailendra Singh <abs218@leigh.edu>
Co-authored-by: Setu Gupta <setu.gupta@partner.samsung.com>
Without specifying the "gem5/gpu" directory, this test attempted to run
the entire test suite. This caused the daily and weekly tests to fail.
This change fixes this.
After merging the old personal gem5 repository with the stable version
v24, I tried to run the project inside the `.devcontainer` environment.
During the image build process, I encountered the following error:
```sh
[7683 ms] Start: Run in container: /bin/sh -c ./.devcontainer/on-create.sh
fatal: detected dubious ownership in repository at '/workspaces/gem5'
To add an exception for this directory, call:
git config --global --add safe.directory /workspaces/gem5
[7724 ms] onCreateCommand failed with exit code 128. Skipping any further user-provided commands.
```
This error occurred due to an ownership permission problem, which I
resolved by adding the following line.
* Removes the "docker-compose.yaml" in favor of "docker-bake.hcl". This
uses the `docker buildx` tool which has the advantage of enabling
multi-platformm builds where desired. By default all images are built
targeting `linux/arm64`, `linux/amd64` and `linux/riscv64` as targets
with the exception of the GPU images where only `linux/amd64` makes
sense.
* Remove unused/older Docker build targets (these can easily be re-added
but they were not regularly built or have any current usage).
* Update "README.md" to better describe these Dockerfiles and how they
are built.
* Simplify GCC and Clang compiler images. Each uses the Ubuntu 24.04 All
Deps image as a base then specialized the compiler on top.
* To simply things, all compiler versions are built from 24.04. This
means **narrowing the supported versions from GCC v10 to v14 and Clang
v14 to v18**.
* Fix some bugs in the "docker-bake.hcl" thus ensuring all targets may
be built from it.
* Cleanup the systemc and sst images: reducing their size and building
them off the common 24.04 ubuntu base image.
Some syscalls were incorrectly using 64 bit
integers instead of VPtr's guest pointers,
causing parameter value corruption. This
commit addresses this issue.
Change-Id: If9e27a7c776b802dda18979d1a83a76c23557359
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Same as with the off_t, some syscalls were using
incorrect size parametres in place of a guest-defined
size_t. This commit changes the signature of said
syscalls and adds the size_t typedef to the
arch-dependent Linux OSs.
Change-Id: Iece43814971a8e6275d25f6789e41528d241d1f4
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Some system calls were using incorrect sizing for
offset parametres, which was causing the ABI to pass
wrong values due to size mismatches. One such syscall
is lseek, which in the Arm syscall table was
incorrectly marked as llseek, which does not exist
in aarch64 Linux. In addition, the off_t alias for
general Linux was changed from an unsigned to a
signed type, to accurately reflect the behaviour
in the real-life Linux operating system.
Change-Id: Iada4b66a8933466c162ba9ec901dbdae73c73a18
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit either adds the implementation or the ignoreFunc
to the corresponding entry in the syscall table for
some Arm syscalls that were required in order to test
the fix for the incorrect parameter size bug in se mode.
Change-Id: Ifc6d87e2decf1bf96ecd81de6690f92927377bf8
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit adds the --param option to the starter_se
configuration script for the Arm ISA. This is in order
to support attaching remote debugger sessions.
Change-Id: I2d8cc9f677f731948872003cca6066d1072ad570
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
A new host tag `gcn_gpu` has been added. This allows for selection of
those GPU tests which depend upon the gcn-gpu docker image to run.
In addition to this, the square GPU tests has been moved to the CI
tests. This ensures some GPU code is compiled and run on every PR.
Since PR #1316, we use sign-extend for all address generation, including
PC, to match the ISA specification for modifiable XLEN. However, when we
set a breakpoint using remote GDB, our address is not sign-extended.
This causes the breakpoint to be set at the wrong address, as specified
in Issue #1463. This PR fixes the issue by sign-extending the address
when setting a breakpoint. This also matches the RISC-V ISA
Specification that "must sign-extend results to fill the entire widest
supported XLEN in the destination register."
Change-Id: I9b493bf8ad5b1ef45a9728bb40fc5e38250fe9c3
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
In `get_default_kernel_root_val()`, now prioiritizes
the explicit disk_device passed from the user over the
default implemented by the board.
Also adjusts syntax for selecting this value in
`set_kernel_disk_workload()` for consistency.
Change-Id: Icddcf438f5b96c2288c3cc608782f191df2c394e
After a lot of debugging and comparing traces I noticed that vrintp was
giving different results from QEMU. An input of 0x3f800000 (1.0) was
being passed to the fplib helpers as (uint32_t)1 which has a completely
different floating-point interpretation and the result was therefore
completely wrong.
I've fixed this as well as all remaining implicit float-to-int
conversions in the ARM instruction execution. There are more
-W(implicit-)float-conversion warnings in the other executors, but for
now this fixes the issue I was seeing.
Change-Id: Ifdeee745ca155d7f4504ac4c54235ac431acdeb9
This PR fixes the issues mentioned in #1448.
**Note that this contribution is the result of a joint collaboration
with @AbhishekUoR**
This PR introduces the following 4 changes:
1. It changes the addresses which are used to compute the stride to
cache line aligned addresses (the current version uses word aligned
addresses)
2. It correctly returns if the stride does not match (as opposed to
issuing prefetches using the new stride incorrectly)
3. It returns if the new stride is 0, indicating multiple reads from the
same cache line.
4. It removes code which is no longer necessary after the addition of
changes number 1 and 3.
Change-Id: Ic346d0e15df6d07e2b93289c8d6b89b4c2f45a34
---------
Co-authored-by: Abhishek Shailendra Singh <abs218@leigh.edu>
1. Builds on top of the Ubuntu 24.04 all-deps image.
2. Unify the download, build, install, and cleanup steps.
Change-Id: I4c2bf8e571dfd228f7df8372cda0f428de59af51
1. Uses the ubuntu-24.04_all-deps as the base image.
2. Unifies the build and cleanup into a single step, thus reducing the
size of the image.
Change-Id: I63b5dad2af0e8b1f6be8ad1f28321c743f36b2dc
These images won't work and make no sense compiling to any platform
other than X86. These are used in SE mode simulations where the host
platform matters.
Change-Id: I47405e930bf511fabcbc93d0b08ee2fb2c556869
1. Uses the all-dependencies image as the base image.
2. Has all compilers use Ubuntu 24.04.
Notes: This change implitly changes our supported compilers to GCC v10
to v13 and Clang v14 to v18. This will be fully incorporated into the
project later.
Change-Id: Id8e2141ea64a34c7e3532605f6ecb7d9ccb76951
This was found while comparing a diverging execution against QEMU traces
and checking for the first mismatched program counter. Fortunately this
was
caused by a branch shortly after this incorrect computation but still
took
a long time to track down.
There are two issues here: the decoder had inverted the cases for *S and
*A,
and the sign bit was wrong for VFN*.
This PR has 3 commits:
- Update scaling methods to scale by multiplication or division when
upcasting or downcasting respectively.
- Preserve the sign when a microscaling conversion results in NaN or
infinity to match hardware.
- Rework rounding to handle cases where conversion results in a denormal
number in the output type so that the value is correct.
Scratch memory requests that are larger than one dword are using a
different memory layout than global instructions. Rather than being
placed contiguously, each dword is interleaved 64 lanes * 4 bytes away
as described in Section 9.1.5.2. "Swizzled Buffer Addressing" in the
MI300 specification. This was verified by comparing MI300 output (which
uses scratch_ instructions) with MI200 (which uses buffer instructions).
MI300 FashionMNIST bs=1 now matches CPU reference.
This requires several changes to the instruction implementations:
- For stores, data in the GPUDynInst can be swizzled before the data is
written to memory. This is easy to do using a helper method. This is
done in the template<int N> variant of initMemWrite. To use this x2
stores are changed to use template<int N> rather than loading a U64. The
swizzle function is renamed to swizzleAddr to avoid confusion with
swizzleData.
- For loads, data is unswizzled in completeAcc when writing register
values. This is not as easy to implement as a helper and is thus
implemented for the three load instructions that load more than one
dword.
- Accessing swizzled data requires at least one packet per dword. A new
GPU memory helper is added to create these packets for scratch requests
specifically. This is called in the template<int N> variant of
initMemRead / initMemWrite. Loads and stores of x2 are changed to use
this variant instead of accessing a U64.
The GPUDynInst status vector restrictions are increased to allow for
swizzled x4 accesses. For simplicity this does not currently support
misaligned swizzled accesses and will panic upon seeing such a case.
Change-Id: Ic686c51e28e0af029a043d5a5b3d4069f2cb94f9
The current implementation does not correctly convert subnormal numbers
(number that fill the underflow gap around zero in floating-point
arithmetic). This commit reworks the rounding code to get correct
results.
First, the min_exp is set to 0 which allows for numbers to become
subnormal when rounding. Second, the rounding code now uses something
closer to "GRS" rounding (guard, round, sticky) which represent the
first bit removed when rounding to a smaller type, the next second bit
removed, and whether any of the other bits removed are one. More details
can be found in the code comments.
Change-Id: Idcd2f1e4383e4012fc3abf73b1f73c847d44f67b
The implementation of microscaling formats uses the Open Compute Project
specification which includes a sign bit for NaN and infinity. This
should be preserved when a conversion results in NaN or infinity.
Change-Id: Id9e99324c6486e256c699016aff301d5f06814d5
Currently there is only a scale() method which multiplies a microscaling
type by an int8 value. This should only be applied when upcasting to
a larger type after conversion to match hardware. When downcasting to a
smaller type, the scaling method should divide by the int8 value before
conversion.
This commit adds both scaling methods.
Change-Id: Ibafa8caa389cde4df609e536cd53bd2289959420
At the moment, a hart does not halt if there are pending interrupts.
However, an implementation can also consider the enable status of the
individual interrupts, i.e., a halted hart would only resume if there
are locally enabled pending interrupts. This commit introduces this
behavior. The wfi behavior is controlled by the new configuration
variable wfi_pending_resume of RiscvISA.
Change-Id: I316239f9732c6e73e6ad692491bca08d773dd995
---------
Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
Functional writes atomically update all copies of a data block, so they
should invalidate any pending LL/SC locks, just like a conventional
write would.
Change-Id: Ic79d2d8d24901f1b6a2ce81dc0e2decc84c0ebbc
Dependency Bot appears to have had difficulty with this file:
https://github.com/gem5/gem5/security/dependabot/29
This PR:
1. Removes the weird "```" which could not be parsed.
2. Ups PyMongo to a more secure version.
Added documentation for `strided_generator.py` and
`strided_generator_core.py.`
Updated clarity of documentation for `linear_generator.py`,
`linear_generator_core.py`, `random_generator.py`, and
`random_generator_core.py`.
Made `max_addr` exclusive instead of inclusive for strided and linear
traffic generation in `strided_gen.cc` and `linear_gen.cc`.
These commits primarily fix the SDMA engine which was (1) using pointer
arithmetic on a variable returned by new and then attempting to free the
modified pointer and (2) using a buffer after it was freed due to the
DMA device calling completion event before Ruby actually completed.
Some minor fixes are included: Stop using uninitialized value as packet
context and using same request pointer for two separate packets for GPU
invalidations.
In gem5, we use the same code base for RISC-V 32 and 64.
However, if we need to allow modifiable XLEN control on CSR.mstatus in
the future, we should follow the RISC-V ISA manual to sign-extend all
the register results, including PC and GPR. If this feature implemented,
the simulator needs to handle user-mode in RV32 but CSR.SATP sets to
Sv39. In this case, 0x80000000 and 0xffffffff80000000 are different
addresses in the 64-bit S-Mode perspective, but they are the same in the
32-bit U-Mode perspective. We should avoid this wrong behavior happening
before we implement this feature.
Thus, we need to sign-extend the results of all the addresses, including
the PC and memory addresses, which currently use zero-extend. As
specified in the RISC-V ISA manual, we use zero-extend in narrow XLEN
mode for the physical address implemented in TLB.
Changes based on spec:
1. Sign-extend narrow XLEN:
https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/machine.adoc?plain=1#L567
2. Zero-extend physical address:
https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/supervisor.adoc?plain=1#L1670
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
The GPU TLB maxOutstandingReqs field gets limited by the associativity.
In the current setup, this means that the max outstanding requests is 32
even though the setup is for 64 entries. Update the associativity to be
64 entries.
Change-Id: I2104e4647d97bf4d1cf5ac447e38ad6ac6a1a0d8
The SQC and TCC invalidations share a Request pointer which they both
modify. This can cause some problems, so use a different request pointer
for each invalidate. The setContext call is also removed as the value
being assigned to it is uninitialized.
Change-Id: I82ea7aa44a4f4515c1560993caa26cc6a89355af
SDMA packets which use dmaVirtWrites call their completion event before
the write takes place in the Ruby protocol. This causes a use-after-free
issue corruption random memory locations leading to random errors. This
commit adds a cleanup event for each packet that uses DMA and sets the
cleanup latency as 10000 ticks. In atomic mode, the writes complete
exactly 2000 ticks after the completion event is called and therefore a
fixed latency can be used. This is not tested with timing mode, which
does not work with GPUFS at the moment, so a warning is added to give an
idea where to look in case the same issue occurs once timing mode is
supported.
Change-Id: I9ee2689f2becc46bb7794b18b31205f1606109d8
The GPU TLB maxOutstandingReqs field gets limited by the associativity.
In the current setup, this means that the max outstanding requests is
32 even though the setup is for 64 entries. Update the associativity
to all 64 entries.
Change-Id: I2104e4647d97bf4d1cf5ac447e38ad6ac6a1a0d8
The SDMA engine copies data in chunks. It currently uses the pointer
returned from new[] and manipulates it using pointer arithmetic. This
modified pointer is then passed to the completion function which deletes
the pointer. Since it is not the original pointer allocated by new[]
this triggers issues in ASAN.
Change-Id: I03ccf026633285e75005509445c62fcbda8eb978
This change changes the addresses that are printed when TrafficGen
DebugFlag is enabled. Previously, hex strings were printed without a
preceding 0x. This change fixes that to distinguish between decimal and
hex.
The compiler tests are failing to to a compile bug in Clang 7:
https://github.com/gem5/gem5/actions/runs/10170081794
Given Ubuntu 20.04 APT installs v10 by default (i.e., with `apt install
clang`). This is the oldest LTS Ubuntu version. It therefore seems
sensible to drop support for older (<v10) versions of clang.
The compiler tests are failing to to a compile bug in Clang 7:
https://github.com/gem5/gem5/actions/runs/10170081794
Given Ubuntu 20.04 APT installs v10 by default (i.e., with `apt install
clang`). This is the oldest LTS Ubuntu version. It therefore seems
sensible to drop support for older (<v10) versions of clang.
Change-Id: I4c48223b80306422beac1464c09f03397c156ba1
This commit removes some comments and adds constexpr in front
of "bool is_secure..." in pif.cc, signature_path.cc, and
signature_path_v2.cc
Change-Id: Icafe1d7c97d1d3fbf6abc12ba87ebb596255b96f
This modifies the crash fix so that the function calls that were
modified use a local variables called `is_secure` instead of a
hardcoded `false`. Some of these existed previously so it made
more sense to use them, while others were newly added in to mark
where the code might need to be changed later.
Change-Id: I0c0d14b74f0ccf70ee5fe7c8b01ed0266353b3c1
This commit fixes the "Need is_secure arg" crash that occurs when
using the IndirectMemoryPrefetcher, SignaturePathPrefetcher,
SignaturePathPrefetcherV2, STeMSPrefetcher, and PIFPrefetcher. This
was done by changing some variables to be AssociativeSet<...>
instead of AssociativeCache<...> and changing the affected function
calls.
Change-Id: I61808c877514efeb73ad041de273ae386711acae
This PR fixes the "Need is_secure arg" crash that occurs when using
IndirectMemoryPrefetcher, SignaturePathPrefetcher,
SignaturePathPrefetcherV2, STeMSPrefetcher, and PIFPrefetcher. This was
done by changing some variables to have the type AssociativeSet<...>
instead of AssociativeCache<...> and adding in "false" or an existing
value for the value of the secure bit in some function calls. Further
changes may be needed to move away from hard-coding values.
This commit removes some comments and adds constexpr in front
of "bool is_secure..." in pif.cc, signature_path.cc, and
signature_path_v2.cc
Change-Id: Icafe1d7c97d1d3fbf6abc12ba87ebb596255b96f
This modifies the crash fix so that the function calls that were
modified use a local variables called `is_secure` instead of a
hardcoded `false`. Some of these existed previously so it made
more sense to use them, while others were newly added in to mark
where the code might need to be changed later.
Change-Id: I0c0d14b74f0ccf70ee5fe7c8b01ed0266353b3c1
This commit fixes the "Need is_secure arg" crash that occurs when
using the IndirectMemoryPrefetcher, SignaturePathPrefetcher,
SignaturePathPrefetcherV2, STeMSPrefetcher, and PIFPrefetcher. This
was done by changing some variables to be AssociativeSet<...>
instead of AssociativeCache<...> and changing the affected function
calls.
Change-Id: I61808c877514efeb73ad041de273ae386711acae
The GPUMem print for when a memstatus request completes accidentally put
a newline before the word "complete", causing complete to print on a
newline and cause confusion. This commit resolves that.
Whenever a GPU kernel is launching a new WG, the GPUKernelInfo debug
flag will print that the kernel is being launched, without the context
of which WG from that kernel is being launched. This has caused some
confusion to users, who think the entire kernel is being launched
repeatedly. To resolve this confusion, update this print to make it
clear which WG is being launched when this print is enabled.
Bumps [mypy](https://github.com/python/mypy) from 1.10.1 to 1.11.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/python/mypy/blob/master/CHANGELOG.md">mypy's
changelog</a>.</em></p>
<blockquote>
<h1>Mypy Release Notes</h1>
<h2>Next release</h2>
<h2>Mypy 1.11</h2>
<p>We’ve just uploaded mypy 1.11 to the Python Package Index (<a
href="https://pypi.org/project/mypy/">PyPI</a>). Mypy is a static type
checker for Python. This release includes new features, performance
improvements and bug fixes. You can install it as follows:</p>
<pre><code>python3 -m pip install -U mypy
</code></pre>
<p>You can read the full documentation for this release on <a
href="http://mypy.readthedocs.io">Read the Docs</a>.</p>
<h4>Support Python 3.12 Syntax for Generics (PEP 695)</h4>
<p>Mypy now supports the new type parameter syntax introduced in Python
3.12 (<a href="https://peps.python.org/pep-0695/">PEP 695</a>).
This feature is still experimental and must be enabled with the
<code>--enable-incomplete-feature=NewGenericSyntax</code> flag, or with
<code>enable_incomplete_feature = NewGenericSyntax</code> in the mypy
configuration file.
We plan to enable this by default in the next mypy feature release.</p>
<p>This example demonstrates the new syntax:</p>
<pre lang="python"><code># Generic function
def f[T](https://github.com/python/mypy/blob/master/x: T) -> T: ...
<p>reveal_type(f(1)) # Revealed type is 'int'</p>
<h1>Generic class</h1>
<p>class C[T]:
def <strong>init</strong>(self, x: T) -> None:
self.x = x</p>
<p>c = C('a')
reveal_type(c.x) # Revealed type is 'str'</p>
<h1>Type alias</h1>
<p>type A[T] = C[list[T]]
</code></pre></p>
<p>This feature was contributed by Jukka Lehtosalo.</p>
<h4>Support for <code>functools.partial</code></h4>
<p>Mypy now type checks uses of <code>functools.partial</code>.
Previously mypy would accept arbitrary arguments.</p>
<p>This example will now produce an error:</p>
<pre lang="python"><code>from functools import partial
</tr></table>
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="570b90a7a3"><code>570b90a</code></a>
Bump version to 1.11</li>
<li><a
href="b3a102ef31"><code>b3a102e</code></a>
Fix <code>RawExpressionType.accept</code> crash with
<code>--cache-fine-grained</code> (<a
href="https://redirect.github.com/python/mypy/issues/17588">#17588</a>)</li>
<li><a
href="aec04c7448"><code>aec04c7</code></a>
Fix PEP 604 isinstance caching (<a
href="https://redirect.github.com/python/mypy/issues/17563">#17563</a>)</li>
<li><a
href="cb44e4d8f1"><code>cb44e4d</code></a>
Fix <code>typing.TypeAliasType</code> being undefined on python <
3.12 (<a
href="https://redirect.github.com/python/mypy/issues/17558">#17558</a>)</li>
<li><a
href="6cf9180e14"><code>6cf9180</code></a>
Fix types.GenericAlias lookup crash (<a
href="https://redirect.github.com/python/mypy/issues/17543">#17543</a>)</li>
<li><a
href="64c1ebf7cf"><code>64c1ebf</code></a>
Bump version to 1.11.1+dev</li>
<li><a
href="dbd5f5cdb6"><code>dbd5f5c</code></a>
Remove +dev from version for 1.11 release</li>
<li><a
href="f0a8c69314"><code>f0a8c69</code></a>
Update CHANGELOG for mypy 1.11 (<a
href="https://redirect.github.com/python/mypy/issues/17540">#17540</a>)</li>
<li><a
href="371f7801e9"><code>371f780</code></a>
CHANGELOG.md update for 1.11 (<a
href="https://redirect.github.com/python/mypy/issues/17539">#17539</a>)</li>
<li><a
href="2563da0c72"><code>2563da0</code></a>
Fix daemon crash on invalid type in TypedDict (<a
href="https://redirect.github.com/python/mypy/issues/17495">#17495</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/python/mypy/compare/v1.10.1...v1.11.1">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ivana Mitrovic <imitrovic@ucdavis.edu>
At the moment the method simply returns the rank number. This is not
particularly useful when enabling debug flags as the beginning of the
line prints something like:
1: <debug_message>
whereas it should really be:
system.dram.rank1: <debug_message>
Change-Id: I0136dc3d182afa4ae2e5a719cb366d8d0f444667
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit adds an error message to src/sim/kernel_workload.cc to tell
the user when the end address of the kernel is greater than the size of
memory. The error message also specifies the minimum memory size needed
to fit the kernel.
Change-Id: I7d8f50889ed8172f64b84f98301a35e5f2f352d3
Implementing generic reset method for MMU allows each ISA implementing
their own reset methods. The default reset MMU method is flush all TLB
entries. For example, The RISC-V needs to do PMP reset when received the
reset signal, but the TLBs don't require to be flushed.
Change-Id: I158261570fb6e5216ec105fbdc53460f83f88d15
This makes it easier to debug unexpected semihosting outputs (in my case
a wrong buffer argument was being passed).
Change-Id: I342610a92fb8efe121d030f7b9ea3307efc4fec3
This is on by default in gem5 (see src/cpu/kvm/BaseKvmCPU.py), however
the perf counters only measure host instruction counters and GPUFS is
not concerned about accuracy of KVM CPU stats. There are also a larger
set of users who have access to KVM, but do not have the paranoid level
low enough to attach performance counters.
Therefore, make the performance counters OFF by default. They can still
be enabled, but this will allow for a larger set of users to follow the
upcoming GPUFS documentation without needing to read through a
troubleshooting section after seeing a gem5 error about the KVM paranoid
level.
Change-Id: I6b465559edf3ce17e7117ada049c60bd39aecd83
These registers were only handled in AArch64 mode but are also
accessible as a c14 registers for AArch32.
Change-Id: I62fe54427e96265df0589308afa1b5d665dbf210
In AArch32 mode it is possible to read a 64-bit counter using mrrc.
Instead of truncating in the PMU code, just allow the instruction
implementation to truncate to 32 bits if accessed using mrc.
Change-Id: I77620f6d1852a7d9e79c1ecee50f4297b4103b1c
This PR has four components:
- Implement a helper method for SDWAB similar to SDWA helpers. SDWAB is
used for VOPC instructions only (vector compares).
- Update two instructions commonly using SDWAB to use helper
(v_cmp_ne_u16 and v_cmp_eq_u16).
- Add panics to *all* VOP1 and VOP2 instructions which do not implement
SDWA or DPP if they use an SDWA or DPP register.
- Add panics to *all* VOPC instructions which do not implement SDWAB or
DPP if they are an SDWA or DPP register.
Only VOP1, VOP2, and VOPC may use SDWA, SDWAB, or DPP. The panics should
therefore cover all instructions which have missing implementations for
these modes. The intent is to exit gem5 instead of continuing simulation
will data that is likely incorrect. Continuing simulation only makes
debugging gem5 more difficult.
If SDWAB or DPP are used on a VOPC instruction and those are not
implemented, it is highly likely to be a problem for the application.
Rather than continue to execute and cause undefined behavior, exit the
simulation with a panic showing the line of the instruction causing the
issue.
Change-Id: Ib3f94df7445d068b26907470c1f733be16cd2fc2
Add a panic if SDWA or DPP is used for an instruction which does not
implement support for it. If an application uses SDWA or DPP it likely
does not operate in the same way as the base instruction and therefore
gem5 should panic rather than continue. It is likely data is incorrect
which will make it more difficult to debug an application.
Change-Id: I68ac448b0d62941761ef4efa0169f95796270f48
This shows an example of how to use the previous commit which adds an
SDWAB helper. The execute() method of both are the same with the
exception of the lambda function passed to the helper method.
Change-Id: I5ffe361440b4020b9f7669c0ed946aa6b3bbec25
Implement a SDWAB helper which accepts a dynamic instruction and a
lambda function defining a comparison function taking two values and
returning a comparison result of 0 or 1 for false or true.
Current instructions which implement SDWA do so on a per-instruction
basis which adds a lot of redundant code. This allows for generic SDWAB
implementations for VOPC instructions.
All modifiers are implemented assuming that SDWBA VOPC instruction
comparison types may be U32, I32, F32, U16, I16, F16 (which exist) but
is extendible to I8, U8, or F8.
Change-Id: Idab58a327c29dd19a1a5457237f3799a04f2031b
Some instructions are clamping floating point outputs unconditionally,
leading to incorrect results. This commit finds instructions with this
issue and checks the clamp bit before applying clamp.
Change-Id: Ibc6de3813d81fd4f9d2c98dd497d19dd34cf6bde
This tests attempts to infer which tests to download per job in the
matrix thereby significantly reducing the download times for each job.
Change-Id: I61b4f4b6410aa86de7437caf213499d805861e0c
This PR improves the vector register groups overlap check in
widening/narrowing
instructions.
- Fix wrong illegal overlap condition between VS2 and VD vector register
groups.
- Also check VS1 vector register group for overlap with VD in
vector-vector
instructions.
- Parametrize widening/narrowing factors in overlap check function to
potentially
handle more cases.
Fixes issue #442.
This commit moves the requires() call that checks the cache coherence
protocol above the imports. This change was made for the chi private l1,
ruby mesi three level, mesi two level, and mi example cache hierarchies.
This ensures that a clear error message about having the wrong coherence
protocol is printed, rather than a less useful message.
Change-Id: I3bac1ffcb1f8a9d94e486237f880cf248e442ba8
This fixes the remaining implicit int/float conversions and enables the
float conversion warnings for clang when building the Arm instruction
execution logic. This depends on the previous fixes.
Change-Id: I51aac94644a483175842c36da2d49d308aaceb49
This changes `long`s in src/mem/physical.cc, which are 32 bits or more,
to `uint64_t`s, which are exactly 64 bits.
Change-Id: I64e089a2ac087bcf58b9c3c918c59dc5ff75d010
TimingExpression enables runtime calculation of the commit latency in
MinorCPU. For this, machInst is obtained by getEMI() to match it with a
given instruction. At default, getEMI() always returns 0 and is
therefore overwritten to enable timing expressions for RISC-V. This was
already done for ARM (see src/arch/arm/insts/static_inst.hh).
Change-Id: I03d669b3439fd24e00cbf893f5db9951dfe56b1f
Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
Updated the bib information of the local RISC-V interrupts.
Change-Id: I666c3df4529e159bd1946ca1a9623e47f84d5d9e
Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
Currently, only the VS2 register group is checked for overlap with VD
when executing a widening/narrowing instruction. This commits extends
the check to VS1, when applicable (i.e. vector-vector operations).
Change-Id: I892b7717c01e25546fb41e05afbd08fc40c60c59
As of now, the widening/narrowing vector register groups overlap check
always assumes a SEW multiplication factor equal to 2 (for either VD or
VS2). This commits aims at making this check more generic.
Change-Id: I4311fc3624cd324ccfdf2a1920a19efc85357120
This commit fixes the overlap check between VS2 and VD register groups
in vector widening instructions. While the narrowing instructions check
is correct, the widening one has to differentiate between two cases
(Vs2 EEW = 2*SEW and Vs2 EEW = SEW). In the first case, overlap is
allowed, as the EEW is the same as Vd. In the second case, the overlap
legality check has to be adapted to use the Vs2 EMUL to calculate the
boundaries. The rule has been derived again from Section 5.2 of RISC-V
"V" Vector Extension specifications, version 1.0.
The patch also includes some small code refactoring, e.g. using
already defined vlmul and constants for vector operands.
Fixes issue #442.
Change-Id: Ic87095fb9079e6c8f53b9a0d79fbf531a85dc71d
Vector reduce float (widening and non-widening) and integer (widening)
instructions initialize the reduce loop operation with the first element
of the destination register (i.e. `Vd[0]`).
Since all reductions per spec seem to be `Vd[0] = Vs1[0] + Vs2[*]`
(where `+` is an arbitrary binary op and `*` indicates all active
elements) gem5 will calculate this incorrectly if `Vd[0]` and/or
`Vs1[0]` are non-neutral for the operation (the later case being because
it's not taken into account at all).
To solve this we just have to initialize the reduction loop to `Vs1[0]`
(the non-widening integer reduction already does this).
If the threacContext of CPU enters the suspend mode, raise the threadID
of threadContext cpu_idle_pins with the high signal to target. If the
threadContext of CPU enters the activate mode, lower the threadID of
thread cpu_idle_pins with low signal to target.
The gem5 crashed when user try to update register value from GDB because
PR[1] changes the index of CSR_XSTATUS to MISCREG_XSTATUS, which is out
of NUM_PHYS_MISCREGS.
The CSR_XSTATUS should use setRegWithMask to update it.
[1] : https://github.com/gem5/gem5/pull/1099
gem5 issue: https://github.com/gem5/gem5/issues/1299
Change-Id: Iefc0d1f5adfb98ecfda0e74907964b47d1864b6d
These two commits add agnostic capability for both tail/mask policies,
for vector memory and arithmetic instructions respectively. The common
policy for instructions is to act as undisturbed if one is (i.e. tail or
mask), or write all 1s if none.
For those instructions in which multiple micro instructions are
instantiated to write to the same register (`VlStride` and `VlIndex` for
memory, and `VectorGather`, `VectorSlideUp` and `VectorSlideDown` for
arithmetic), a (new) micro instruction named `VPinVdCpyVsMicroInst` has
been used to pin the destination register so that there's no need to
copy the partial results between them. This idea is similar to what's on
ARM's SVE code. This micro also implements the tail/mask policy for this
cases.
Finally, it's worth noting that while now using an agnostic policy for
both tail/mask should remove all dependencies with old destination
registers, there's an exception with `VectorSlideUp`. The
`vslideup_{vx,vi}` instructions need the elements in the offset to be
unchanged. The current implementation overrides the current vta/vma and
makes them act as undisturbed, since they require the old destination
register anyways. There's a minor issue with this though, as
`v{,f}slide1up` variants do not need this, but since they share the same
constructor, will act all the same.
Related issue #997.
1. Responder (downstream components):
When sending a BEGIN_REQ, the timing annotation marks the time when
a transaction is visible to the target (see [1] on page 465).
When writing the data, the downstream component calculates the
transfer time and would send END_REQ after this time (see [1] on
page 540). Therefore, not the payloadDelay, but the headerDelay
should be used, as already written as a comment in the source files.
When reading data, payloadDelay will be 0 anyway.
2. Requester (upstream component):
For data read, the begin of the transfer is marked by BEGIN_RESP
and the upstream component would delay END_RESP to model the
data transfer (see [1] on page 540). Therefore, BEGIN_RESP should be
delayed by the headerDelay, not the payloadDelay.
[1] "IEEE Standard for Standard SystemC® Language Reference Manual," in
IEEE Std 1666-2023 (Revision of IEEE Std 1666-2011), vol., no.,
pp.1-618, 8 Sept. 2023, doi: 10.1109/IEEESTD.2023.10246125.
Change-Id: I3b5e8ad6bc37cbb309b124efdc8764fca3728b7a
Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
Implement small translation table extension.
This feature relaxes the lower limit on the size of the translation
tables, by increasing the maximum permitted values of the T1SZ and T0SZ
field in: TCR_EL1, TCR_EL2, TCR_EL3,VTCR_EL2 and VSTCR_EL2
Change-Id: I4c2187815b2d7f14407edb38095c6bcc2004b62a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR is introducing the concept of IPA space in gem5, which is
necessary after the implementation
of FEAT_SEL2. In fact we can now have Secure and Non-Secure intermediate
physical address spaces when the PE is
executing in Secure state.
We are currently using the LongDecriptor for both stage1
and stage2 translations. There are several cases where
the bitfield meaning changes depending on the translation
stage.
Change-Id: Ic33d9ef225a57fd79ce2b4bf47896aeb6bdd8d9c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
For ARMv8 CPUs this register allows reading a 64-bit cycle counter in
from 32-bit execution state.
Change-Id: I7cd9e2711ada5156920440cc3c89e7a74ca54a49
This patch is adding a functional implementation of FEAT_XS. Unless we
operate with DVM enabled, TLBIs broadcasting is accomplished in 0 time;
so there is no timing benefit introduced by enabling FEAT_XS other than
the way it affects TLB management (invalidation)
Change-Id: I067cb8b7702c59c40c9bbb8da536a0b7f3337b5d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This was fixed to v4.0.0 under the assumption the flaky nature or the
daily-tests.yaml workflow was due to a later, minor v4 version causing
issue. This did not work. Ergo this patch reverts back to using the
latest v4 version.
Change-Id: I72b8811022268f34309de193445987dbe0085951
This Workflow is flakey and it appears to msotly around the usage of the
the merging of all the gem5 builds into a single artifact. In attempt to
stabalize the workflow this merge step has been removed. ALL jobs now
download all gem5 binaries.
Change-Id: Ib1e9d82514c3d5e5af9de974a477e213f8af2aaa
This is made to run on the 'stable' branch to schedule workflow runs on
the `develop` branch. This solves the problem of GitHub Workflows being
scheduled to only run on 'stable' branch' thus ignoring changes made to
them on 'develop'
With this schedule we no longer need to force a checkout of 'develop' in
the workflows. As such these have been removed.
The scheduled workflows are now triggered via "workflow_dispatch" via
the "scheduler.yaml" workflow
This guarantees all changes put on the staging branch and, for whatever
reason, put on stable are on develop.
In addition this PR reverts specific release procedures (e.g., reverting
the removal removing the -Werror compilation flag, and changing the
versioning back to "DEVELOP").
This guarantees all changes put on the staging branch and, for whatever
reason, put on stable are on develop. This syncs the branches.
Change-Id: Ib3513f49977bb4ed3046c2d9d6cf162953b15887
The acquire-release flavor of the ldadd instruction should read ldaddalx
(eg. ldaddalb/ldaddalh) according to specification. However, this is
currently noted as ldadd"la"x (eg. ldaddlab/ldaddlah).
Issue: https://github.com/gem5/gem5/issues/1224
Change-Id: Ib932fa0e572207729c923c27f24c34cc21dff0e5
Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
- gem5 was querying the full version of gem5 that is `24.0.0.0` while
searching for resources.
This was causing an error to find resources on staging branch.
This change trims the gem5 version to be just the major.minor version.
Change-Id: I30c3a1b38c631981f797ef0fd2b616e6a66ca18e
- gem5 was querying the full version of gem5 that is `24.0.0.0` while
searching for resources.
This was causing an error to find resources on staging branch.
This change trims the gem5 version to be just the major.minor version.
Change-Id: I30c3a1b38c631981f797ef0fd2b616e6a66ca18e
This change adds a new utility function for processing Spatter traces
into SpatterKernels under parse_kernels.
Additionally, it adds documentation for all the utility functions in
spatter_kernel.py.
Lastly, it adds an example script for running one spatter trace using
SpatterGenerator to the examples.
This change extend the AbstractMemory class to add a getter method that
allows other components to get the memory's range without interleaving.
This method will be useful if other components in the system need to
interleave the memory range different to the way the memory has
interleaved them.
This change extend the AbstractMemory class to add a getter method that
allows other components to get the memory's range without interleaving.
This method will be useful if other components in the system need to
interleave the memory range different to the way the memory has
interleaved them.
This change adds a new utility function for processing Spatter traces
into SpatterKernels under parse_kernels.
Additionally, it adds documentation for all the utility functions in
spatter_kernel.py.
Lastly, it adds an example script for running one spatter trace using
SpatterGenerator to the examples.
Removing -Werror flag on the stable branch ensures that as new compilers
are releases (likely withs stricter warnings) gem5 remains compilable.
Change-Id: I0267c895414b630c1d7cd9b28236249790b3006f
Previously, all of the TLB lookup/insert functions were using the full
virtual addresses even though the variables in the functions said "vpn."
This change explicitly converts the virtual address to the VPN without
any least significant zeros for the offset. I.e., vpn >> page_size.
The main bug solved in this changeset is the asid was |'d with the upper
bits of the virtual address, but sometimes there were all 1's.
Therefore, you could get a TLB hit even if the ASID was different.
Interestingly, the page that seemed to cause these issues was a 1 GiB
page.
This change also starts refactoring some of the page table details to
support sv46 and sv57 page table formats.
In my testing, the Linux kernel boot uses large pages (even OpenSBI uses
large pages), so it seems that large pages also work. However, this
seems like magic to me, so I'm not sure if it's correct.
This change also updates some asserts, and debug statements with more
useful debugging information.
Partially fixes#1235. More testing needs to be done to be confident.
Introduced in #1234, this caused compilation to faill in Apple Silicon
systems. This bug is the same as #582 where a more detailed explanation
is provided.
This change fixes the way indices are generated in a multi generator
setup.
It changes it from all cores generating the same trace of indices for
accessing the index array to each core generating an interleaved subset
of indices.
For an example look below for traces (indices to index array) in a 2
core setup.
Before:
core_0: 0, 1, 2, 3, 4, 5, 6, 7, ...
core_1: 0, 1, 2, 3, 4, 5, 6, 7, ...
After:
core_0: 0, 1, 2, 3, 8, 9, 10, 11, ...
core_1: 4, 5, 6, 7, 12, 13, 14, 15, ...
Additionally, this change fixes the SpatterKernel class in the standard
library to comply with the change in the SpatterGen source code.
Rather than adding the options to *every* config that might be using
GPU_VIPER.py, just change the Ruby config to check if the option is
available before trying to use it. Otherwise, reverts to what was the
default on stable.
Change-Id: Ia6f1d0827d489ee2a35c598b644461cbff59e247
I noticed while using the stable branch that there were a few typos of
the word 'cache' and so I've corrected a few files where I found such
typos.
Change-Id: I7c7f64812039f34fe39d0c45c4f5ce921cba06d0
Often, you want to add another argument to the default kernel arguments.
This function allows you to do that on the `kernel_disk_workload` board
mixin.
This allows for multiple gem5 simulations to be spawned from a single
parent gem5 process, as defined in a simgle gem5 configuration. In this
design _all_ the `Simulator`s are defined in the simulation script and
then added to the mutlisim module. For example:
```py
from gem5.simulate.Simulator import Simulator
import gem5.utils.multisim as multisim
# Construct the board[0] and board[1] as you wish here...
simulator1 = Simulator(board=board[0], id="board-1")
simulator2 = Simulator(board=board[1], id="board-2")
multisim.add_simulator(simulator1)
multisim.add_simulator(simulator2)
```
This specifies that two simulations are to be run in parallel in
seperate threads: one specified by `simulator1` and another by
`simulator2`. They are then added to MultiSim via the
`multisim.add_simulator` function. The user can specify an id via the
Simulator constructor. This is used to give each process a unique id and
output directory name. Given this, the id should be a helpful name
describing the simulation being specified. If not specified one is
automatically given.
To run these simulators we use `<gem5 binary> -m gem5.utils.multisim
<script> -p <num_processes>`. Note: multisim is an executable module in
gem5. This is the same module we input into our scripts to add the
simulators. This is an intentionally modular encapsulated design. When
the module processes a script it will schedule multiple gem5 jobs and,
dependent on the number of processes specified, will create child gem5
processes to processes tjese jobs (jobs are just gem5 simulations in
this case). The `--processes` (`-p`) argument is optional and if not
specified the max number of processes which can be run concurrently will
be the number of available threads on the host system.
The id for each process is used to create a subdirectory inside the
`outputdor` (`m5out`) of that id name. E.g, in the example above the
ID's are `board-1` and `board-2`. Therefore the m5 out directory will
look as follows:
```sh
- m5out
- board-1
- stats.txt
- config.ini
- config.json
- terminal.out
- board-2
- stats.txt
- config.ini
- config.json
- terminal.out
```
Each simulations output is encapsulated inside the subdirectory of the
id name.
If the multisim configuation script is passed directly to gem5 (like a
traditional gem5 configuraiton script, i.e.: `<gem5 binary> <script>`),
the user may run a single simulation specified in that script by passing
its id as an argument. E.g. `<gem5 binary> <script> board-1` will run
the `board-1` simulation specified in `script`. If no argument is passed
an Exception is raised asking the user to either specify or use the
MultiSim module if multiprocessing is needed.
If the user desires a list of ids of the simulations specified in a
given MultiSim script, they can do so by passing the `--list` (`-l`)
parameter to the config script. I.e., `<gem5 binary> <script> --list`
will list all the IDs for all the simulations specified in`script`.
This change comes with two new example scripts found in
'configs/example/gem5_library/multsim" to demonstrate multisim in both
an SE and FS mode simulation. Tests have been added which run these
scripts as part of gem5' Daily suite of tests.
Notes
=====
* **Bug fixed**: The `NoCache` classic cache hierarchy has been modified
so the Xbar is no longet set with a `__func__` call. This interfered
with MultiProcessing as this structure is not serializable via Pickle.
This was quite bad design anyway so should be changed
* **Change**: `readfile_contents` parameter previously wrote its value
to a file called "readfile" in the output dorectory. This has been
changed to write to a file called "readfile_{hash}" with "{hash}" being
a hash of the `readfile_contents`. This ensures that, during multisim
running, this file is not overwritten by other processes.
* **Removal note**: This implementation supercedes the functionality
outlined in 'src/python/gem5/utils/multiprocessing'. As such, this code
has been removed.
Limitations/Things to Fix/Improve
=================================
* Though each Simulator process has its own output directory (a
subdirectory within m5out, with an ID set by the user unique to that
Simulator), the stdout and stderr are still output to the terminal, not
the output directory. This results in: 1. stdout and stderr data lost
and not recorded for these runs. 2. An incredibly noisy terminal output.
* Each process uses the same cached resources. While there are locks on
resources when downloading, each processes will hash the resources they
require to ensure they are valid. This is very inefficient in cases
where resources are common between processes (e.g., you may have 10
processes each using the same disk image with each processes hashing the
disk images independently to give the same result to validate the
resources).
Change-Id: Ief5a3b765070c622d1f0de53ebd545c85a3f0eee
---------
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
This PR adds source code for C++ implementation of SpatterGen as well as
SpatterKernel. SpatterGen uses a PyBindMethod to add kernels to the
backend code. This way the process of processing json files could be
offloaded to python. In addition it adds standard library components for
SpatterGenCore and SpatterGen. These two components follow the same
structure as AbstractCore and AbstractProcessor. In addition
spatter_kernel.py adds a definition for SpatterKernel in python to make
adding kernels to C++ easier. Also it adds utility functions for parsing
dictionaries read from json as well as partitioning traces for multicore
setups.
Currently, gem5's inst tracer prints the whole vector register container
by default. The size of vector register containers in gem5 is the
maximum size allowed by the ISA. For vector-length agnostic (VLA) vector
registers, this means ARM SVE vector container is 2048 bits long, and
RISC-V vector container is 65535 bits long. Note that VLA implementation
in gem5 allows the vector length to be varied within the limit specified
by the ISAs.
However, in most use cases of gem5, the vector length is much less than
65535 bits. This causes two issues: (1) the vector container requires
allocating and moving around a large amount of unused data while only a
fraction of it is used, and (2) printing the execution trace of a vector
register results in a wall of text with a small amount of useful data.
This change addresses the problem (2) by providing a mechanism to limit
the amount data printed by the instruction tracer. This is done by
adding a function printing the first X bits of a vector register
container, where X is the vector length determined at runtime, as
opposed to the vector container size, which is determined at compilation
time.
Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7
---------
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This is a follow-up on the discussion here [1].
The IsInvalid flag was previously defined as an instruction that does
not appear in the ISA. However, a micro-architecture can choose to not
recognize an instruction in and raise illegal instruction fault even if
the instruction is in the ISA.
This change modifies the definition of a Invalid instruction such that,
if a StaticInst instruction is marked as IsInvalid, it means the
instruction is not recognized by the decoder. This means that any
instruction recognized by the decoder are not invalid, even if the
instruction is not in the official ISA spec; e.g., m5
pseudo-instructions.
Note that instructions that are recognized by the decoder but are chosen
to act as a nop are not invalid. This applies to WarnUnimplemented
instructions, e.g. hint instructions.
[1] https://github.com/gem5/gem5/pull/1071
Change-Id: I1371b222d8b06793d47f434d0f148c5571672068
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
- Fix address calculation issue with scratch_* instructions when SVE bit
is 0.
- Fix ds_swizzle_b32 not mapping to execution unit.
- Implement VOP3 V_FMAC_B32.
- Fix architected scratch address register being clobbered.
Tested with MNIST from PyTorch quickstart tutorial and nanoGPT on
mi300.py.
Currently writing to SRF which is incorrect, as the physical register
number can be clobbered by another wavefront if registers get renamed to
the physical register number.
Fix this by actually architecting the register, i.e., there is a
dedicated "hardware" register in the wavefront class.
Change-Id: I94e9e463eed348b2928cae884c1c20566c00984d
For scratch instructions only, this bit specifies if an offset in a VGPR
should be used for address calculation. This is new in MI300 and was
previously the LDS bit. The LDS bit is rarely used and in fact gem5 does
not even check this bit.
This fixes a bug when SADDR == 0x7f (i.e., no SGPR should be used) where
a VGPR was being added to the address when it should have been ignored.
Change-Id: I9864379692df6795b25b58b98825da05d18fc5db
This change adds code for SpatterGenCore and SpatterGen as well
as SpatterKernel to the standard library. SpatterGenCore and
SpatterGen follow the same structure as AbstractCore and
AbstractProcessor. spatter_kernel.py adds utility functions
to parse dictionaries as well as partition a list into
multiple lists through interleaving to be used when setting up
a multicore SpatterGen.
Change-Id: I003553e97f901c0724f5feac0bb6e21a020bd6ad
This change adds source code for SpatterGen ClockedObject.
The set of source code pushed includes code for SpatterKernel
that tracks whether information is being gathered or scattered
as well as the list of indices to be accessed. This model
has PyBindMethod to add SpatterKernels from python.
This way all the preparations for kernels can be done in python.
SpatterGen has a few parameters that model limits on a few of
hardware resources in the backend of a processor, e.g. number
of functional units to calculate effective address, the latency
of calculating effective address, number of integer registers.
Change-Id: I451ffb385180a914e884cab220928c5f1944b2e3
A load instruction can be replayed when
1) it's strictly ordered or
2) it falls into load-store forwarding mismatch.
Case 1 was considered in executeLoad function but the case 2 wasn't. It
causes the case-2 replayed load instruction to violate the assertion
condition "assert(!load_inst->isExecuted())" in LSQUnit::read. This
commit fixes the problem by adding consideration of the case 2 in
LSQUnit::executeLoad.
Co-authored-by: Minje Jun <minje.jun@samsung.com>
The GPUFS scripts include support for dumping and resetting
stats at kernel boundaries by identifying specific GPU kernel
exit events. This commit extends that support to work with
GPU SE-mode support.
Change-Id: I662233ae71e2987d90af1fd0100e29036b2ef1c6
The IsInvalid flag indicates that the static instruction is not part of
the executing ISA and not part of m5's pseudo-instructions. This flag
provides a way to recognize an illegal instruction at the decode stage.
GPU_VIPER.py was modified to use these options but they did not exist,
breaking GPUFS. This commit adds them to fix the issue.
Change-Id: I0095f400ea606c4e8d91a41870ef208465cef803
Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296
gpu-compute: add GPU RubyHitMiss for TCP and TCC
Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1
gpu-compute: Add RubyHitMiss flag for TCP and TCC cache
Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5
gpu-compute: Add RubyHitMiss flag for TCP and TCC cache
Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5
Remove space
Change-Id: I401f528c6f128ba0956bdbc232e8f2ae37bf648c
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.
This PR incorporates numerous improvements and fixes to the gem5
PyStats. This includes:
* PyStats now support SimObject Vectors. The PyStats representing them
are subscribable and therefore acceptable by accessing an index: e.g.,:
`simobjectvec[0]`. (This replaces the `Vector` group PyStat)
* Adds the `SparseHist` PyStats.
* Adds the `Vector2d` to PyStats.
* The `Distribution` PyStats is fixed to be a vector of Scalars.
* Tests added for the PyStat's Vector and bugs fixed.
'v4.0.0' wasn't working. The following error was occurred:
```
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' for action 'actions/upload-artifact/merge@v4.0.0'.
```
Change-Id: I658b0fe292df029501fbc1286acb06f4014ae4e1
'v4.0.0' wasn't working. The following error was occurred:
```
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' for action 'actions/upload-artifact/merge@v4.0.0'.
```
Change-Id: I658b0fe292df029501fbc1286acb06f4014ae4e1
This commit adds instSeqNum to the atomic responses in
GPU_VIPER-TCC.sm. This will be useful when debugging issues related to
GPU atomic transactions
Change-Id: Ic05c8e1a1cb230abfca2759b51e5603304aadaa3
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.
Change-Id: I8200ce6e77da7c8b4db285c0cc8b8ca0dfa7d720
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem
replacement
policies for TCC, TCP and SQC caches for GPU
The IsInvalid flag indicates that the static instruction is not part
of the executing ISA and not part of m5's pseudo-instructions. This
flag provides a way to recognize an illegal instruction at the decode
stage.
Change-Id: I2779c6edcd8c5e6a77ea11cad3ff73bacb79d800
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Adding RP_choose function to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Change-Id: I86cc41cca19f8e0d24d8cf015e2e034a1fc4bc43
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose function to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
When accessing memory using functionalAccess(), the MMU could tell us to
use a nonsecure access even though the CPU is operating in secure mode.
I noticed this while trying to run a simple semihosting hello world with
the MMU+caches enabled and the semihosting calls ended up reading from
memory instead of the caches due to an S/NS mismatch.
See also https://github.com/gem5/gem5/pull/1198 which happens to also
mask the issue I saw, but I believe both changes are needed.
Change-Id: I9e6b9839b194fbd41938e2225449c74701ea7fee
Some arithmetic instructions of the riscv vector extension where still
using the default VLEN=256 instead of the dynamic one through the
inherited `vlen` attribute. Most of them only use this to calculate the
effective index for the mask element like so:
```
uint32_t ei = i + vtype_VLMAX(vtype, vlen, true) * this->microIdx;
if (this->vm || elem_mask(v0, ei)) {
...
```
This means that instructions will wrongly compute the mask index in the
second and subsequent micro instructions (`microIdx` > 0). This commit
fixes this by adding the corresponding `set_vlen` snippet to the
affected instruction formats.
Change-Id: Ib041de972d6938490741a9fb4c214a6a5172c34e
While running a simple Arm32 binary, I noticed that all memory
transactions were being marked as NS instead of S once I turn on the MMU
(even though the page tables have the NS bit set to zero). The result of
this was that semihosting calls were failing since they were using
functional accesses with the SECURE flag set, but the caches only
contained NS tagged entries so these accesses always read stale values
from DRAM.
Digging through the Arm MMU code it appears that the NS bit lookup was
being keyed of the `secureLookup` flag which is only used for long
descriptors. I believe 0c28712f51 should
have used isSecure instead of secureLookup. To avoid using these
uninitialized values in the future I wrapped the LPAE state in a
std::optional to ensure that it is only accessed once initialized.
Change-Id: Ibc406ed3f4cfa768f470e34a5eca3c1a2bf45cd8
The SYS_WRITEC and SYS_WRITE0 calls are specified as writing to the
debug channel, so it is a reasonable expectation for these messages to
be visibile immediately after the semihosting call.
Change-Id: I8e6e9a7aab593a59e82ecb9cf4603c18c7a8acbe
Clang warns as follows: `warning: definition of implicit copy
constructor for 'TranslResult' is deprecated because it has a
user-declared copy assignment operator`
Change-Id: Ic701d8522aac75d569f4f513f54de91f76a17e48
`src/dev/virtio/VirtIORng 2.py` is identical to
`src/dev/virtio/VirtIORng.py`, and the former does not appear in any
build script.
Change-Id: I9c5f1b1a3809d1c7028b630c32310e540613e232
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Fixed the assertion statement in the cpu's translation.hh file so that
it doesn't fail the assertion if the cache is clean.
I compile this c code to `test`
```c
#include <stdio.h>
static inline void clflush(volatile void *p) {
__asm__ volatile ("clflush (%0)" : : "r"(p) : "memory");
}
int main() {
int data = 42; // Example variable
printf("Value before clflush: %d\n", data);
clflush(&data);
printf("Value after clflush: %d\n", data);
return 0;
}
```
And run it with this script
`./build/X86/gem5.opt configs/learning_gem5/part1/two_level.py ./test`
In order to verify that it no longer fails the assertion check.
GitHub Issue: #862
Change-Id: I6004662e7c99f637ba0ddb07d205d1657708e99f
This feature has been available since Vega10 but was never implemented.
MI300 adds a few new instructions that make use of this more often
(e.g., v_mov_b64).
Change-Id: Ieeb7834462b76d77c0030f49622d0de09f90c9e4
This instruction is a simple move from accumulation register to
accumulation register. It is essentially a move with the accumulation
offset added to the register index.
Change-Id: Ic93ae72599b75c91213f56ebafe5bbd7b2867089
Flat, scratch, and global share the same instruction implementation with
different address calculations essentially. These instructions were
already implemented but not added to the decoder. This commit adds the
remaining scratch instructions which have a shared instruction
implementation.
Change-Id: I8f2e9ceb221294dce1b81c45745b642f0592d985
The lines `constexpr int B_I = std::ceil(64.0f / (N * M / H));` caused
the following compilation error in clang Version 16:
```
error: constexpr variable 'B_I' must be initialized by a constant
expression
```
`std::ceil` is not a const expression. Therefore instances of this
expression in instructions.hh have been replaced with a constant
expression friendly alternative.
This is calling our compiler tests to fail:
https://github.com/gem5/gem5/actions/runs/9288296434/job/25559409142
Change-Id: I74da1dab08b335c59bdddef6581746a94107f370
Currently when data is downgraded by MOESI_AMD_Base-CorePair (e.g. due
to a replacement) this requires a 4-way handshake between the CorePair
and the dir. Specifically, the CorePair send a message telling the dir
it'd like to downgrade then, the dir sends an ACK back and then, the
CorePair writes the data back, and finally, the dir ACKs the writeback.
This is very inefficient and not representative of how modern protocols
downgrade a request. Accordingly, this commits updates the downgrade
support such that the CorePair writes back the data immediately and then
the dir ACKs it.
Thus, this approach requires only a 2-way handshake.
Change-Id: I7ebc85bb03e8ce46a8847e3240fc170120e9fcd6
Co-authored-by: Neeraj Surawar <neerajs@hyrule.cs.wisc.edu>
Bigs fixed of note:
1. The 'find' method has been fixed to work. This involved making
'children' a class implemented per-subclass as required.
2. The 'get_all_stats_of_name' method has been removed. This was not
working at all correctly and is largely doing what 'find' does.
2. The functionality to get an element in a vector via an attribute call
(i.e., self.vector1 == self.vector[1]) has been implemented this
maintaining backwards compatibility with the regular Python stats.
Change-Id: I31a4ccc723937018a3038dcdf491c82629ddbbb2
The SimStat Object is nothing more than a group of other SimStats and is
therefore logically a group. With this, functionality can be shared more
easily.
Change-Id: I5dce23a02d5871e640b422654ca063e590b1429a
When compiler tries to inline a vector construction with a default value
as default constructed ReplaceableEntry. It can complain about the
uninitialized member.
Let's provide basic initialization to the members.
Example codepath:
SignaturePathV2 constructor
-> GlobalHistoryEntry() as init_value to AssociativeSet
-> AssociativeSet initialize vector<Entry> with init_value
1. Adds newly added PyStat classes to "__init__.py", ensuring they can
all be accessed via a `m5.ext.pystats` import.
2. Simplifies the layout out "__init__.py" to just import all classes
from all files.
Change-Id: I43bfc5e7ff1aec837e661905304c6fb10b00c90e
This PR is doing the following:
1) Fixing memory attributes of partial translation entries (table walks)
2) Properly setting the cacheability of table walks
Fix#1168. Prevent logical instructions like AND, OR, and TEST from
having input dependencies on the previous value of the Zaps register
(ZF+AF+PF+SF) by having them set AF=0, rather than not modifying AF.
* The GCC used in the GCN-GPU images was increase from version 8 to
version 10. This was necessary due to PR #1145 which made GCC require
GCC >=10. This patch was previously part of #1161 but has been merged
into
this PR.
* A patch has been applied to ROCm-OpenCL-Runtime to fix a linking error
in which there were multiple definitions of `ret_val`. This issue is
highlighted here:
https://github.com/ROCm/ROCm-OpenCL-Runtime/issues/113.
This was previously part #1161 but has been moved into this PR.
* The Dockerfile's `RUN` command (built to layers in the Docker image)
have been refactored so sources and built objects are deleted in the
same RUN command as where they were built and installed. This reduces
the size of the image substantially: from 16.3GB down to 6.6GB.
* The `apt upgrade` has been removed. This step (previously at the start
of the file) did nothing of importance. Removing it saves both time
building the image and reduces the size of the image by a small amount.
* `--depth=1` is used when cloning repositories so the entire commit
tree
tree is not pulled each time. This saves some time when building the
image.
* `apt -y update` has been added where `apt -y install` is used so
CACHED image layers do not become an issue in the future if the image
were to be rebuilt.
Fix#1169. Break the input dependency of 32-bit and 64-bit 'mov'
micro-ops on the prior value in the destination register. Such a
dependency is required for 8-bit and 16-bit moves, as they do not
completely overwrite the value in the destination register. However, it
is unnecessary for 32-bit moves (which implicitly zero the upper 32
bits) and 64-bit moves.
This patch implements the fix by adding a new code template field inside
the generated constructors of X86StaticInst's, called `invalidate_srcs`,
which instruction implementations like `mov` can use to conditionally
invalidate particular source registers as needed. In `mov`'s case, this
is when the data size is 32 or 64 bits.
Change-Id: Ib2aef6be6da08752640ea3414b90efb7965be924
SDMA RLC queues do not currently remove their doorbell mapping. This can
cause issues re-registering the queue and prevents the pending doorbells
feature from working. In addition the data value of the doorbell (the
ring buffer rptr) is not saved, leading to UB when this workaround is
used.
This commit removes the doorbell mapping from the gpu device when the
SDMA engine unmaps an RLC queue and copies the next doorbell value to
the pending packet as was originally intended.
Change-Id: Ifd551450f439c065579afcf916f8ff192e7598ab
According to the Arm architecture reference manual, it is possible to
force the broadcast of the following TLBIs:
AArch64: TLBI VMALLE1, TLBI VAE1, TLBI ASIDE1, TLBI VAAE1, TLBI VALE1,
TLBI VAALE1, IC IALLU, TLBI RVAE1, TLBI RVAAE1, TLBI RVALE1, and TLBI
RVAALE1.
AArch32: BPIALL, TLBIALL, TLBIMVA, TLBIASID, DTLBIALL, DTLBIMVA,
DTLBIASID, ITLBIALL, ITLBIMVA, ITLBIASID, TLBIMVAA, ICIALLU, TLBIMVAL,
and TLBIMVAAL.
Via the HCR_EL2.FB bit
Change-Id: Ib11aa05cd202fadfbd9221db7a2043051196ecbd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
When determining the cacheability of table walks,
SCTLR.C should only be used in stage1 EL1&0 translations.
Stage2 translations should rely on HCR_EL2.CD instead
Change-Id: I1b0830bc3fb5086f68d7a7a1560c7fed5d126d28
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Make table walks uncacheable if marked as uncacheable
in either inner or outer shareable domain
Change-Id: I5898a3b91b5b919e0beda6c6fe896394e3ab94df
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The Daily Tests are failing when downloading artifacts as part of the
`testlib-long-tests` matrix:
https://github.com/gem5/gem5/actions/runs/9250821764/job/25448583827.
It _could_ be that since upgrading to `actions/download@v4`, we're
hitting a limit as the `testlib-long-tests` are downloading every gem5
binary compiled in the `build-gem5` step, each with it's own
`actions/download` step, for every test.
This change adds a small job after `build-gem5` which creates a merged
artifact containing all the gem5 binaries then uses this to lessen the
number of times this action is called in such a short period of time.
Even if the bug still persists, this solution is neater than what was
there previously.
StoreThrough in VIPER when the TCP is disabled, GLC bit is set, or SLC
bit is set will bypass the TCP, but will temporarily allocate a cache
entry seemingly to handle write coalescing with valid blocks. It does
not attempt to evict a block if the set is full and the address is
invalid. This causes a panic if the set is full as there is no spare
cache entry to use temporarily to use for DataBlk manipulation. However,
a cache block is not required for this.
This commit removes using a cache block for StoreThrough with invalid
blocks as there is no existing data to coalesce with. It creates no
allocate variants of the actions needed in StoreThrough and pulls the
DataBlk information from the in_msg instead. Non-invalid blocks do not
have this panic as they have a cache entry already.
Fixes issues with StoreThroughs on more aggressive architectures like
MI300.
Change-Id: Id8687eccb991e967bb5292068cbe7686e0930d7d
Those AArch64 instructions/registers were labelled as executable
from EL3 only if SCR_EL3.NS == 1. This is not valid anymore
after the introduction of FEAT_SEL2
The new static analysis in GCC 13 finds issues with operand.hh. This
commit fixes the error so that gem5 compiles when BUILD_GPU is true.
Change-Id: I6f4b0d350f0cabb6e356de20a46e1ca65fd0da55
Those AArch64 instructions/registers were labelled as executable
from EL3 only if SCR_EL3.NS == 1. This is not valid anymore
after the introduction of FEAT_SEL2
Change-Id: Ie7b56f3fe779c3a99d4f0ef937c7c8ec0530b00e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is making it easier for TLBI instructions to share code. Common
code (under the form of tlbi* functions) are closely matching the
instruction description in the Arm pseudocode
Change-Id: If10c22fb4a7df2bcd0335e9761286ad3c458722b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit addresses Jason's comment
(https://github.com/gem5/gem5/pull/996#discussion_r1613870880) which
highlighted putting the `_m5.stats.processDumpQueue` call in the
iteration through the `root` object in `get_simstat` caused this
function be potentially called many times when it only needs to be
called once. This chance moved this call to just before this iteration
and will tehrefore only be called once (if required) per `get_simstat`
execution.
Change-Id: I16908b6dee063a0df7877a19e215883963bfb081
The bit 0 of register should be 0 for jump address. Wrong handling the
jump address may cause infinite run or segment fault.
gem5 issue: https://github.com/gem5/gem5/issues/981
Fixed the issue that did not allow building TLM.
Build commands:
```bash
scons build/ARM/gem5.opt
scons setconfig build/ARM USE_SYSTEMC=n
scons --with-cxx-config --without-python --without-tcmalloc build/ARM/libgem5_opt.so
cd util/tlm
scons
```
Following this README, I tested it successfully with the simple examples:
https://gem5.googlesource.com/public/gem5/+/master/util/tlm/README
GitHub Issue: #591
Change-Id: If07fae2eb20ad62627e733573f61bc42d594f970
---------
Co-authored-by: Ivana Mitrovic <ivanamit91@gmail.com>
1. Thests here for the Scalar tasks are named appropriately. Not just
generic "SimStats tess".
2. We remove 'simstat' terminology. The correct word is "Pystats".
Change-Id: Idebc4e750f4be7f140ad6bff9c6772f580a24861
As Distribution inherits from Vector, it should be constructed with
a Dictionary of scalars (in our implementation, a dictionary mapping the
vector position's unique id for each bin and the value of that bin).
Change-Id: Ie603c248e5db4b6dd7f71cc453eebd78793f69a3
The big thing missing from the Vector stats was that each position in
the vector could have it's own unique id (a str, float, or int) and each
position in the vector can have its own description. Therefore, to add
this the Vector is represented as a dictionary mapping the unique ID to
a Pystat Scaler (whcih can have it's own unique description.
Change-Id: I3a8634f43298f6491300cf5a4f9d25dee8101808
This isn't a true Base class, it's just a Vector. In gem5 all Vectors
are Scalar Vectors. This change simplfies the naming.
Change-Id: Ib8881d854ab18de6acbf0fb200db2de6a43621a7
note: Due to #556 / #555, we don't support GCC 9. This PR removes gcc-8
which means gem5 would support GCC >= version 10.
The reason for removing gcc-8:
1. We already dropped support for gcc-9. I don't see any good reason to
support anything <9 as a result.
2. GCC is relatively old, and we're probably supporting a bit too many
compiler versions anyway. In Ubuntu 22.04, gcc-11 is downloaded by
default with `apt`. It doesn't seem many system are still using gcc.
3. There is a weird compiler bug in gcc-8 which is causes failure when
compiling gem5 since the inclusion of #1123. The error received is as
follows:
```sh
In file included from src/arch/riscv/tlb.hh:42,
from src/arch/riscv/mmu.hh:45,
from build/ALL/arch/riscv/generated/exec-g.cc.inc:14,
from build/ALL/arch/riscv/generated/generic_cpu_exec.cc:5:
src/arch/riscv/utility.hh: In instantiation of ‘FloatType gem5::RiscvISA::ftype(IntType) [with FloatType = float8_t; IntType = unsigned char]’:
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38839:42: required from ‘gem5::Fault gem5::RiscvISAInst::Vfwcvt_xu_f_vMicro<ElemType>::execute(gem5::ExecContext*, gem5::trace::InstRecord*) const [with ElemType = float8_t; gem5::Fault = std::shared_ptr<gem5::FaultBase>]’
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38856:16: required from here
src/arch/riscv/utility.hh:327:15: error: parameter ‘a’ set but not used [-Werror=unused-but-set-parameter]
ftype(IntType a) -> FloatType
~~~~~~~~^
src/arch/riscv/utility.hh: In instantiation of ‘IntType gem5::RiscvISA::f_to_wui(FloatType, uint_fast8_t) [with FloatType = float8_t; IntType = short unsigned int; uint_fast8_t = unsigned char]’:
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38838:49: required from ‘gem5::Fault gem5::RiscvISAInst::Vfwcvt_xu_f_vMicro<ElemType>::execute(gem5::ExecContext*, gem5::trace::InstRecord*) const [with ElemType = float8_t; gem5::Fault = std::shared_ptr<gem5::FaultBase>]’
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38856:16: required from here
src/arch/riscv/utility.hh:570:20: error: parameter ‘a’ set but not used [-Werror=unused-but-set-parameter]
f_to_wui(FloatType a, uint_fast8_t mode)
```
Note: This is currently causing our SST Daily tests to fail, and our
compiler tests to fail.
This change fixes#1148
I have only added an acknowledged return, as we dont ahve remote and
wrap mode so it can only be in stream mode.
Change-Id: I1882042d873ff0e9465c9491238554c8fbb9aa76
Those were not part of the performTlbi switch and simulation was
therefore panicking when they were encountered
Change-Id: Ifbe0b89e45539df4abc147ac5970b0caf0d9dfdc
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit fixes and refactors the implementation of viota. It also
overrides the generateDisassembly function in viota's macro/micro to
correctly print out the instruction when tacing/debugging.
For example, it changes from:
viota_m vd, vd, vs2, v0.t
to:
viota_m vd, vs2, v0.t
This adds two failsafes which may cause a panic on some machines. First,
check the host machine has the KVM XCR capability before calling getXCRs
or setXCRs. Second, ensure the x87 bit, which must always be one, will
always return at least one by modifying the return value in readMiscReg.
Change-Id: I5e778acc926a47443ef6cef29fabd84eb69bb9ba
This implements some missing loads and store that are commonly used in
applications with MFMA instructions to load 16-bit data types into
specific register locations: DS_READ_U16_D16, DS_READ_U16_D16_HI,
BUFFER_LOAD_SHORT_D16, BUFFER_LOAD_SHORT_D16_HI.
Change-Id: Ie22d81ef010328f4541553a9a674764dc16a9f4d
Add a unit test for the MXFP types (bf16, fp16, fp8, bf8). These types
are not currently operated on directly. Instead the are cast to float
values and then arithmetic is performed. As a result, the unit test
simply checks that when we convert a value from MXFP type to float and
back that the values of the MXFP type match. Exact values are used to
avoid discrepancies with rounding.
Can be run using scons build/VEGA_X86/unittests.opt .
Change-Id: I596e9368eb929d239dd2d917e3abd7927b15b71e
These instructions are used in some of the F16 MFMA example applications
to convert to/from floating point types.
Change-Id: I7426ea663ce11a39fe8c60c8006d8cca11cfaf07
This instruction is new in MI300 and is used in some of the example
applications used to test MFMAs.
Change-Id: I739f8ab2be6a93ee3b6bdc4120d0117724edb0d4
This adds the decodings for all of the matrix fused multiply add (MFMA)
and sparse matrix fused multiply accumulate (SMFMAC) instructions up to
and including MI300. This does not yet provide the implementation for
these instructions, however it is easier and less tedious to add them in
bulk rather that one at a time.
Change-Id: I5acd23ca8a26bdec843bead545d1f8820ad95b41
The microscaling formats (MXFP) and INT8 types require additional size
checks which are not needed for the current MFMA template. The size
check is done using a constexpr method exclusive to the MXFP type,
therefore create a special class for MXFP types. This is preferrable to
attempting to shoehorn into the existing template as it helps with
readability. Similar, INT8 requires a size check to determine number of
elements per VGPR, but it not an MXFP type. Create a special template
for that as well.
This additionally implements all of the MFMA types which have test cases
in the amd-lab-notes repository (https://github.com/amd/amd-lab-notes/).
The implementations were tested using the applications in the
matrix-cores subfolder and achieve L2 norms equivalent or better than
MI200 hardware.
Change-Id: Ia5ae89387149928905e7bcd25302ed3d1df6af38
This class can be used to load multiple operand dwords into an array and
then select bits from the span of that array. It handles cases where the
bits span two dwords (e.g., you have four dwords for a 128-bit value and
want to select bits 35:30) and cases where multiple values < 32-bits are
packed into a single dword (e.g., two bf16 values).
This is most useful for packed arrays and instructions which have more
than two dwords. Beyond two dwords, the operator[] overload of
VectorOperand is not available requiring additional logic to select from
an operand. This helper class handles that additional logic itself.
Change-Id: I74856d0f312f7549b3b6c405ab71eb2b174c70ac
The open compute project (OCP) microscaling formats (MX) are used in the
GPU model. The specification is available at [1]. This implements a C++
version of MXFP formats with many constraints that conform to the
specification.
Actually arithmetic is not performed directly on the MXFP types. They
are rather converted to fp32 and the computation is performed. For most
of these types this is acceptable for the GPU model as there are no
instruction which directly perform arithmetic on them. For example, the
DOT/MFMA instructions operating may first convert to FP32 and then
perform arithmetic.
Change-Id: I7235722627f7f66c291792b5dbf9e3ea2f67883e
Release of MI300X simulation capability:
- Implements the required MI300X features over MI200 (currently only
architecture flat scratch).
- Make the gpu-compute model use MI200 features when MI300X / gfx942 is
configured.
- Fix up the scratch_ instructions which are seem to be preferred in
debug hipcc builds over buffer_.
- Add mi300.py config similar to mi200.py. This config can optionally
use resources instead of command line args.
It appears we have been trying to read 64-bit arguments for ARM32 since
695583709b. I noticed that SYS_OPEN was
trying to read a really long string as the pathname argument and it
turned out it was reading from the wrong stack offset. With this change
I can successfully run some of the semihosting tests for ARM32.
Change-Id: Ie154052dac4211993fb6c4c99d93990123c2eacf
In BaseSemihosting::readString() we were using the len argument to
allocate a std::vector without checking whether the value makes any
sense. This resulted in a std::bad_alloc exception being raised prior to
https://github.com/gem5/gem5/pull/1142 for my semihosting tests. This
commit prevents semihosting from reading more than 64K for string
arguments which should be more than sufficient for any valid code.
Change-Id: I059669016ee2c5721fedb914595d0494f6cfd4cd
This commit fixes the implementation of vrgather instruction based on
rvv 1.0.
In section 16.4. Vector Register Gather Instructions,
> Vector-scalar and vector-immediate forms of the register gather are
also provided. These read one element from the source vector at the
given index, and write this value to the active elements of the
destination vector register. The index value in the scalar register and
the immediate, zero-extended to XLEN bits, are treated as unsigned
integers. If XLEN > SEW, the index value is not truncated to SEW bits.
The fix zero-extends the index value in the scalar register and the
immediate.
Add a config capable of simulating MI300X ISA (gfx942). This is similar
to the mi200.py config and uses the same scripts followed by some
tuneable parameters. This config optionally lets the user call the
runMI300GPU function with gem5 resources. This allows for something like
the following before a VIPER stdlib python is available:
```
import mi300
from gem5.resources.resource import obtain_resource
disk = obtain_resource("x86-gpu-fs-img")
kernel = obtain_resource("x86-linux-kernel-5.4.0-105-generic")
app = obtain_resource("square-gpu-test")
mi300.runMI300GPUFS("X86KvmCPU", disk, kernel, app)
```
Tested cold boot config, checkpoint create and restore, and using gem5
resources.
Change-Id: I50a13d7a3d207786b779bf7fd47a5645256b1e6a
Architected flat scratch is added in MI300 which store the scratch base
address in dedicated registers rather than in SGPRs. These registers are
used by scratch_ instructions. These are flat instruction which
explicitly target the private memory aperture. These instructions have a
different address calculation than global_ instructions.
This change implements architected flat scratch support, fixes the
address calculation of scratch_ instructions, and implements decodings
for some scratch_ instructions. Previous flat_ instructions which happen
to access the private memory aperture have no change in address
calculation. Since scratch_ instructions are identical to flat_
instruction except for address calculation, the decodings simply reuse
existing flat_ instruction definitions.
Change-Id: I1e1d15a2fbcc7a4a678157c35608f4f22b359e21
Add support for the following two extensions:
[Zvfh](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point):
Vector Extension for Half-Precision Floating-Point
[Zvfhmin](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#184-zvfhmin-vector-extension-for-minimal-half-precision-floating-point):
Vector Extension for Minimal Half-Precision Floating-Point
For instructions (`vfncvt[.rtz].x[u].f.w`) and (`vfwcvt.f.x[u].v`) which
will become defined when `SEW = 8`, a new template
`VectorFloatWideningAndNarrowingCvtDecodeBlock` is added and 8-bit
floating point type (`float8_t`) is defined.
The data type `float8_t` is introduced in the newer `3e` version of the
SoftFloat Package, however, the current version in use is `3d` which
does not include this definition. Despite this, `float8_t` is utilized
solely for constructing the `vfncvt[.rtz].x[u].f.w` and
`vfwcvt.f.x[u].v` instructions when `SEW = 8`. There are no operations
that directly manipulate data of the `float8_t` type.
This is the version for MI300. For the most part, it is the same as
MI200 with the exception of architected flat scratch (not yet
implemented in gem5) and therefore a new version enum is required.
Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a
- Adding this line as not specifiying GNU non executable stack was
throwing warnings when building m5
for ubuntu 24.04
Change-Id: I620c508be4090804698391cff671ba5091b053d7
These changes to sweep and sweep_hybrid for NVM allow them to run. I'm
not an expert on this, so I'm not sure if these are technically correct,
but they no longer fail when running
`build/X86/gem5.opt configs/nvm/sweep.py` and `build/X86/gem5.opt
configs/nvm/sweep_hybrid.py`
GitHub Issue: #669
The SIMPOINT_BEGIN should do nothing by default since it might be used
in various cases.
In
[https://www.mail-archive.com/gem5-users@gem5.org/msg22383.html](mailing
list), a user discovered a bug with the current
`simpoints-se-restore.py` example.
The bug is caused by the default behavior of the SIMPOINT_BEGIN exit
event.
When taking a checkpoint with `simpoints-se-checkpoint.py`, it stores
the future exit event scheduled at the beginning of the simulation. I
did not notice this when I wrote and tested the example script due to
the long print out log and my custom handler of the SIMPOINT_BEGIN exit
event.
In the restoring, the SIMPOINT_BEGIN exit event was triggered right
before the region end, so it resets the stats before the final stats
dump. Therefore, the simulation time is 0 as the user discovered.
This patch should fix this bug.
Change-Id: I800dfbd28d7b2c842864a1ab7d84b8f8e17b9b3c
This PR implements FEAT_MPAM on the CPU side. We define a MPAM system
registers and a mechanism
for tagging memory requests with the MPAM information bundle as
specified in existing documentation [1].
What this PR is *not* covering is the MPAM implementation in a MSC
(Memory System Component).
Which means at the moment it's only possible to have static partitioning
schemes (via the PartitioningPolicies
already part of gem5) and there is currently no way to dynamically
program partitions at runtime.
[1]: https://developer.arm.com/documentation/ddi0487/latest/
Quote from change[1]
> The RISC-V spec clarifies the CSR instruction operation, some of them
shall not read or write CSR by the hints of RD/RS1/uimm, but the
original version use the 'data != oldData' condition to determine
whether write or not, and always read CSR first.
See CSR instruction in spec:
Section 9.1 Page 56 of
https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf
|||Register operand|||
|--- |--- |--- |--- |--- |
|Instruction|rd is x0|rs1 is x0|Reads CSR|Writes CSR|
|CSRRW|Yes|-|No|Yes|
|CSRRW|No|-|Yes|Yes|
|CSRRS/CSRRC|-|Yes|Yes|No|
|CSRRS/CSRRC|-|No|Yes|Yes|
|||Immediate operand|||
|Instruction|rd is x0|uimm = 0|Reads CSR|Writes
CSR|
|CSRRWI|Yes|-|No|Yes|
|CSRRWI|No|-|Yes|Yes|
|CSRRSI/CSRRCI|-|Yes|Yes|No|
|CSRRSI/CSRRCI|-|No|Yes|Yes|
The issue cause the ubuntu hanging because we shared the same status CSR
with `mstatus`, `sstatus` and `ustatus` and interrupt enabling CSR with
mip, sip and uip. We may need to read origin CSR without effect of
unmask bits to avoid override the bits of other CSR. Now the ubuntu can
work after the patch merged.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/67717
The destination select should take a value of the selection size (dword,
word, or byte) starting at bit 0, move that to the selected destination,
and then apply the unused constraint (DST_U) to the remaining word or
bytes. Currently the code is selecting the word/byte currently being
iterated over, rather than the least significant word/byte. As a result,
any selection that is not word 0 or byte 0 will be replaced with the
original destination value at those bits. This results in the wrong
value.
This commit changes the orig bits to be the original dest value at the
lowest word / byte location. Tested with the mfma_i32_16x16x16i8 example
which uses an SDWA V_OR_B32 to pack i8 values into VGPRs for the MFMA.
Change-Id: I54ed819479a25fa9276d29a8f14f0fea7fd71afe
This PR includes a check for `m_pkt` being null and appropriately
handles that case. This issue was causing the Daily tests to fail.
Change-Id: I87142ca14ca4ab3d8306153a1cf34c2629a119ba
This patch is adding the NS bit to CHI requests to make sure they are
properly tagged according to their security
Change-Id: I33d3610edefbb5a05a6090e9125c35d4fb8bca58
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The extended control registers were not being updated in the KVM thread
context nor updated in the KVM state. This was causing issues when
checkpointing since the XCR0 value was reverting to the default value
rather than what it was previously before the checkpoint. THis was
causing multiple applications to crash due to executing instructions
which are now illegal instructions due to XCR0 being incorrect.
This commit adds the XCR0 as a misc register similar to the exiting x86
control registers and adds all of the helper functions to access and set
the register value. It also adds support for updating the KVM CPU's
state with the register value and updating the thread context's misc reg
value so that it is checkpointed along with the other misc regs.
Note that this does *not* add support for XSAVE of the AVX state (i.e.,
the upper 128 bits of YMM registers). It does however fix the immediate
problem in issue #958 .
Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
Includes fixes for several bugs reported via email, self found, and
internal reports. Also includes runs through Valgrind and UBsan. See
individual commits for more details.
The scalar cache is not being invalidated which causes stale data to be
left in the scalar cache between GPU kernels. This commit sends
invalidates to the scalar cache when the SQC is invalidated. This is a
sufficient baseline for simulation.
Since the number of invalidates might be larger than the mandatory queue
can hold and no flash invalidate mechanism exists in the VIPER protocol,
the command line option for the mandatory queue size is removed, which
is the same behavior as the SQC.
Change-Id: I1723f224711b04caa4c88beccfa8fb73ccf56572
The implementation of SYS_FLEN was missing, which caused picolibc to
treat this file as not implemented. Additionally, there was a bug in the
SYS_READ call that was comparing the wrong variable against the passed
buffer length. It was comparing the current file position against the
buffer length instead of the number of written bytes. Finally, pos was
unititialized which could result in spurious errors.
Change-Id: I8b487a79df5970a5001d3fef08d5579bb4aa0dd0
This prints what values are read/written to LDS and the previous value
on write. This is useful for debugging problems with LDS instructions.
Change-Id: I30063327bec1a1a808914a018467d5d78d5d58b4
SDMA ptePde packets are generating a warning that a GART address is
missing, causing a wrong address to be clobbered by the operation.
This commit fixes this by converting the GART address when the queue is
running in privledged mode, which is the only mode allowed to use GART
addresses. This removes the warnings and writes to the correct memory
region.
Change-Id: I64acac308db2431c5996b876bf4cda704f51cf25
The VOP3 instruction encoding generally states that ABS/NEG modifiers in
the instruction encoding are only valid on floating point data types.
This is currently coded in gem5 to mean floating point *instructions*.
For untyped instructions like V_CNDMASK_B32, we don't actually know what
the data type is. We must trust that the compiler did not attempt to
apply these bits to non-FP data types.
This commit simply removes the asserts. The ABS/NEG modifiers are
therefore ignored which is consistent with the ISA documentation.
This is done on the lane manipulation instructions V_CNDMASK_B32,
V_READLINE_B32, and V_WRITELANE_B32 which are typically used to mask off
or move data between registers. Other bitwise instructions (e.g.,
V_OR_B32) keep the asserts as bitwise operations on FP types are
genernally illegal in languages like C++.
Change-Id: I478c5272ba96383a063b2828de21d60948b25c8f
Fixes several memory leaks, mostly of small and medium severity. Fixes
mismatched new/new[] and delete/delete[] calls.
Change-Id: Iedafc409389bd94e45f330bc587d6d72d1971219
The address has one too many zeros and is therefore placed in a memory
region usually used for system memory. As a result this causes failure
when trying to run a simulation with a huge amount of memory.
Change the address to be within the C000'0000h - FFFF'FFFFh X86 I/O hole
as was intended.
Change-Id: I5d03ac19ea3b2c01a8c431073c12fa1868b3df24
Three main fixes:
- Remove the initDynOperandInfo. UBSAN errors and exits due to things
not being captured properly. After a few failed attempts playing with
the capture list, just move the lambda to a new method.
- Invalid data type size for some thread mask instructions. This might
actually have caused silent bugs when the thread id was > 31.
- Alignment issues with the operands.
Change-Id: I0297e10df0f0ab9730b6f1bd132602cd36b5e7ac
From sepc
> Instructions that access a non-existent CSR are reserved. Attempts to
access a CSR without appropriate privilege level raise
illegal-instruction exceptions or, as described in Section 13.6.1,
virtual-instruction exceptions. Attempts to write a read-only register
raise illegal-instruction exceptions. A read/write register might also
contain some bits that are read-only, in which case writes to the
read-only bits are ignored.
Setting the bit not in the mask should be ignore rather than raise the
illegal exception. The unmask bits of xstatus CSR are `WPRI`, the
unmasks bits of xie are `RO`(above priv v1.12) or `WPRI`(priv v1.11 and
priv v1.10), the unmask bits of xip CSR are `RO`(above priv v1.12) or
`WPRI`(priv v1.11) or `WIRI` (priv v1.10).
Note: The workload of `riscv-ubuntu-20.04-boot` uses the priv v1.10.
More details please see the `RISC-V spec: Privileged Architecture`
v1.10:
https://github.com/riscv/riscv-isa-manual/releases/tag/riscv-priv-1.10
v1.11(20190608):
https://github.com/riscv/riscv-isa-manual/releases/tag/Ratified-IMFDQC-and-Priv-v1.11
v1.12(20211213):
https://github.com/riscv/riscv-isa-manual/releases/tag/Priv-v1.12
Change-Id: I5d6e964e99b30b71da3dc267cd1575665d922633
Currently, the filesRootDir is prepended for all paths that do not start
with '/'. However, we should not be doing this for the special files :tt
and :semihosting-features. Noticed this while testing semihosting with a
non-empty filesRootDir.
Change-Id: I156c8b680cb71cdc88788be3b0e93fc1d52e11e5
The implementation of SYS_FLEN was missing, which caused picolibc to
treat this file as not implemented. Additionally, there was a bug in
the SYS_READ call that was comparing the wrong variable against the
passed buffer length. It was comparing the current file position against
the buffer length instead of the number of written bytes.
Finally, pos was unititialized which could result in spurious errors.
Change-Id: I8b487a79df5970a5001d3fef08d5579bb4aa0dd0
This PR added RISC-V Integer Conditional Operations Extension, which is
in the RVA23U64 Profile Mandatory Base. And the performance of
conditional move instructions in micro-architecture is an interesting
point to explore.
Zicond instructions added: czero.eqz, czero.nez
Changes based on spec:
https://github.com/riscvarchive/riscv-zicond/releases/download/v1.0.1/riscv-zicond_1.0.1.pdf
As discussed in https://github.com/orgs/gem5/discussions/954:
In the refactor made by commit f65df9b959 conditional indirect
branches are no longer updated in the indirect predictor.
This kind of branches do not exist in x86 neither arm, but they are
present in PowerPC.
This patch, enables the indirect predictor to track this kind of
branches.
With this patch we are adding the specific logic required to tag memory
requests from the PE with MPAM data. This is happening within the MMU
before a translation is completed
Change-Id: Ifecb285244dd469a639150d69a7e884fe3c441be
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
This is for fast retrieval of the highest implemented
exception level
Change-Id: Id631c2b999d46a8b79570e4043ae04bc2b2e7531
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
The extended control registers were not being updated in the KVM thread
context nor updated in the KVM state. This was causing issues when
checkpointing since the XCR0 value was reverting to the default value
rather than what it was previously before the checkpoint. THis was
causing multiple applications to crash due to executing instructions
which are now illegal instructions due to XCR0 being incorrect.
This commit adds the XCR0 as a misc register similar to the exiting x86
control registers and adds all of the helper functions to access and set
the register value. It also adds support for updating the KVM CPU's
state with the register value and updating the thread context's misc reg
value so that it is checkpointed along with the other misc regs.
Note that this does *not* add support for XSAVE of the AVX state (i.e.,
the upper 128 bits of YMM registers). It does however fix the immediate
problem in issue #958 .
A checkpoint upgrader is also provided to add the default value of XCR0
if the checkpoint tag is missing.
Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
In the payload event queue in Gem5ToTlmBridgeBase, the phase is checked
twice for BEGIN_RESP. This commit removes the second if clause since it
is unnecessary.
Duplicate if clause in line 234 & line 256
dd2689905f/src/systemc/tlm_bridge/gem5_to_tlm.cc (L234-L267)
please correct me if I am missing something important
Fix#1055. Prioritize committing from exiting threads before we consider
other threads using the specified SMT commit policy. All instructions in
the ROB for exiting threads should already have been squashed. Thus,
this ensures that the ROB instruction queues for all exiting threads
will be empty at the end of the current cycle, avoiding the assertion
failure encountered in #1055.
Change-Id: Ib0178a1aa6e94bce2b6c49dd87750e82776639dc
Fix#1042. Clear the current fetch macro-op if the instruction
initiating the squash is the last micro-op in its macro-op.
Change-Id: I77f60334771277e47f19573d4067b3a7bc5488b2
Fix#1044. This patch adds checks for message types (PUTX_COPY, DATA,
DATA_EXCLUSIVE) that contain data blocks but were missing from the
original `functionalRead` method in MESI Three-Level messages.
Change-Id: I0cedc314166c9cc037bf20f5b7fef5552dd1253c
A usual system register read/write pattern is something like the
following
```
switch(el) {
case EL1:
tc->readMiscReg(REG_EL1);
case EL2:
tc->readMiscReg(REG_EL2);
case EL3:
tc->readMiscReg(REG_EL3);
}
```
To avoid repeating these switch statements all over gem5, we define
templated functions which have
an accessor struct as a template parameter. These accessor will help
populating the templated switch
construct. We provide the FAR register accessor as an example. The
accessor should define the following
fields: (type, el0, el1, el2, el3)
Example:
```
struct FarAccessor
{
using type = RegVal;
static const MiscRegIndex el0 = NUM_MISCREGS;
static const MiscRegIndex el1 = MISCREG_FAR_EL1;
static const MiscRegIndex el2 = MISCREG_FAR_EL2;
static const MiscRegIndex el3 = MISCREG_FAR_EL3;
};
```
Fix#1064 by adding support for hardware performance counters on hybrid
architectures like Intel Alder Lake.
Hybrid architectures have multiple types of cores, each of which require
the instantiation of a separate performance counter. The KVM CPU's
PerfKvmCounter class was not aware of this, any only instantiated a
single performance counter, implicitly bound to the P-core only. This
meant that if gem5 ever ran on an E-core, the various hardware
performance counters would not get updated properly, in some cases
always zero (e.g., for the number of instructions executed).
This patch adds support for hybrid host architectures as follows. First,
we convert PerfKvmCounter into an abstract class, which has two concrete
implementations: SimplePerfKvmCounter and HybridPerfKvmCounter. The
former is used for non-hybrid architectures or for non-hardware
performance counters and is functionally equivalent to the prior
implementation of PerfKvmCounter. The latter is used for instantiating
hardware performance counters (i.e., of type PERF_TYPE_HARDWARE) on
hybrid host architectures. It does so by internally instantiating two
SimplePerfKvmCounters, one for a P-core and one for an E-core. Upon
read, it sums the results of reading the two internal counters.
Change-Id: If64fcb0e2fcc1b3a6a37d77455c2b21e1fc81150
Use it accordingly in the faulting/exception logic
Change-Id: I2f6360d04698b6fb7188e776f1d6966e99ce19b1
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
This is adding two templated functions for reading/writing
system registers (MiscRegs). It is introducing them inside
a new misc_regs namespace.
Change-Id: I21233337c057673d46d1147971ebabbfc2c2bb6a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
This reverts commit c1de2b8762. We revert
the commit as Ruby does not use get_runtime_isa anymore after [1]
[1]: https://github.com/gem5/gem5/pull/241
Change-Id: Iaac8d64194bbd53a9b1a57a796ff92f763c75a87
This PR is refactoring the Arm PageTableWalker in the following way:
1) Simplifying the currState handling logic (mainly the tear down)
2) Amending the TlbTestInterface APIs to use a RequestPtr reference
3) Use finalizePhysical even when MMU is off, which means allowing
memory mapped m5ops to work also in that circumstance
Remove the requirement in TreePLRU's implementation that the number of
leaves (i.e., the number of cache ways) be a power of two. Firstly, on
some recent processors, this is not the case---for example, Intel Golden
Cove's L1D has 12 ways. Secondly, The implementation of TreePLRU appears
to work just fine as-is with a way count that's not a power of two.
Change-Id: If2a27dc5bbe7a8e96684f79ce791df5c0b582230
Now that the Request has been made an Extensible object, it
can carry within itself much more data. It makes sense
to pass it to the TlbTestInterface as more information about
the table walk can be extracted from it.
This is also aligning with the testTranslation utility which
is expecting a request reference as first argument.
Change-Id: I3dbc9a81d6b4bcc1801246ba7eb4136774d8f3c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
They both make final checks to the VA->PA translation before
relinquishing control back to the translate client (usually
CPU code)
Change-Id: Ib0a9da25404248c22c6a240817d2f50f0913fdf7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
The finalizePhysical is just checking if the physical
address falls within the m5op region (if using mmapped
m5ops). There's not reason why we shouldn't enable it
with virtual memory off
Change-Id: I5ab80fd4e7886743abd4b7d85937b72253b578d3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
We also unify the fault handling logic; rather than cleaning
up the WalkerState in several places scattered throughout the
walking code, we handle faults in the top level method
Change-Id: Ia22fb6f27044ff445fffbab228777a48efa473cb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
It's more efficient to pass a reference of the tester to the
TableWalkers. In this way a table walk check is tested directly
from the walkers instead of going through the MMU every time.
Change-Id: I9820dbabb8b551981005a65efa54a76b1a027541
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
This is done in order to differentiate between EL0 (unprivileged) and
EL1. Effectively it won't change much as most of the decisions are
now taken according to the translation regime which will be the
same regardless (EL10)
Change-Id: I218037e9c19cf638aff05c51869e439204d9af69
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Add support for event overflows when the host thread migrates across
differnt types of cores on a hybrid host architecture. This patch
achieves this by simply halving the sample period for each performance
counter. Since there are two types of cores, this guarantees that an
overflow event will trigger before N events occur, where N is the
requested period (e.g., number of instructions to simulate). This
may result in many early triggers (up to log2(N)) before the requested
period is reached. However, gem5's existing bookkeeping logic already
handles this case properly: if fewer events than requested occurred,
it will set a new period (N - observed) and resume execution. This loop
will exit once N events have actually occurred.
Change-Id: Iff85237da1ae1aa25bc2045fbf9091726291fe36
Fix#1064 by adding support for hardware performance counters on hybrid
architectures like Intel Alder Lake.
Hybrid architectures have multiple types of cores, each of which require
the instantiation of a separate performance counter. The KVM CPU's
PerfKvmCounter class was not aware of this, any only instantiated a
single performance counter, implicitly bound to the P-core only. This
meant that if gem5 ever ran on an E-core, the various hardware
performance counters would not get updated properly, in some cases
always zero (e.g., for the number of instructions executed).
This patch adds support for hybrid host architectures as follows. First,
we convert PerfKvmCounter into an abstract class, which has two
concrete implementations: SimplePerfKvmCounter and HybridPerfKvmCounter.
The former is used for non-hybrid architectures or for non-hardware
performance counters and is functionally equivalent to the prior
implementation of PerfKvmCounter. The latter is used for instantiating
hardware performance counters (i.e., of type PERF_TYPE_HARDWARE) on
hybrid host architectures. It does so by internally instantiating two
SimplePerfKvmCounters, one for a P-core and one for an E-core. Upon
read, it sums the results of reading the two internal counters.
Change-Id: If64fcb0e2fcc1b3a6a37d77455c2b21e1fc81150
- After removal of the ClientWrapper class, the mocking of clients needs
to be changed to _create_client function.
- Commented failing tests due to persistence issues.
The persistence is being caused as the new mocked clients
not being used as the older clients are persisting over
the tests.
Change-Id: Ie342c9fc8103504dd12f49ae30b3bf62d189ce1d
- There was a bug in JSONClient when searching
for resoruces. The id was not checked and
the booleans were not set to true when
optional search queries like resource_version
and gem5_version are not passed.
Change-Id: I4aa7c5388035144ec6864d57130ad09e6709692e
The "linux/limits.h" equivalent on Apple systems is "sys/syslimits.h".
By adding an include guard to include the correct header dependent on
the host system, we can compile m5term on Mac OS systems.
This is not needed with upload-artifact v4 directories are archived and
compressed by default.
This zip step was also causing Daily/Weekly test failures due to not
running `apt update` before the `apt install` for the zip utility. Ergo
this patch fixes these errors.
A user reported a bug with the SSE4.1 version of memcmp in libc. When
enabled the simulated program crashes with SIGILL. After attempting all
fixes recommended by Intel SDM and still not working, turning the bit
off instead.
Similar, the default XSAVE functionality is not completely implemented
for AVX and newer ISA extensions. Therefore, there is not much point to
claiming to support the more advanced versions of XSAVE (XSAVEOPT,
XSAVEC, XSAVES, and XGETBV with ECX=1).
Note that none of these bits are enabled for non-GPU full system
simulations (see src/arch/x86/X86ISA.py). This only impacts GPUFS
simulations.
Change-Id: I8eb7bf0f2a0a29226095e7889fec9c1e8a65f88f
Due to an oversight, the PyUnit tests were not being run as part of the
gem5 CI tests. This was because they are located in "tests/pyunit"
instead of "tests/gem5", where the CI GitHub Action workflow searched
for tests to run and where all other tests reside.
This adds the Pyunit tests as a seperate job in the CI GitHub Action's
workflow.
Due to an oversight, the PyUnit tests were not being run as part of the
gem5 CI tests. This was because they are located in "tests/pyunit"
instead of "tests/gem5", where the CI GitHub Action workflow searched
for tests to run and where all other tests reside.
This adds the Pyunit tests as a seperate job in the CI GitHub Action's
workflow.
Change-Id: I63d93571fde11c19bf3d281c034eddf4b455ae4e
This PR introduces a missing pice of far atomic implementation. This
pull request incorporates several changes:
- Enable 2-level and 4-level (and N-level) cache hierarchies, removing
Atomic_NoWait transactions
- Fix Unique Near policy implementation that raised abort
- Add support for alloc_on_atomic == False. Enables Far Atomics on
systems where the HNF does not allocate evicted lines at LLC (Like in
WriteUpdate).
Movfp instruction did not account for only copying the lower half of src
register if dataSize is 4.
GitHub Issue: #893
I used the test code in issue #893 to verify the fix is working.
Hi, we've noticed some issues with the Uart8250 device when using it as
the Linux console. Sometimes the Uart interrupt would remain constantly
posted, so Linux would continue to try and handle it, effectively
resulting in an infinite loop. With this patch, I'm no longer seeing any
issues, but my testing has been limited to configurations and workloads
we're interested in at Imagination, so please let me know if there's
some other tests I should run or if you notice any other issues.
This patch fixes several issues with interrupt posting and clearing in
the uart8250 device.
The "status" member variable and the console interrupt should be kept in
sync. However, in one code path in readIir, the interrupt bit was being
cleared in the status variable but not in the platform controller.
Additionally, in some code paths, the interrupts would be cleared in the
status variable and in the interrupt controller, but a future interrupt
would remain scheduled, causing a spurious interrupt and setting a bit
in status to 1.
These issues can confuse the kernel and result in an ininite interrupt
handling loop.
Another issue is related to the fact that there are two interrupt causes
(TX and RX) and both of them can be valid at the same time. When one of
them becomes no longer valid, we should check the status of the other
one before clearing the interrupt.
This patch addresses the issues listed above and refactors the interrupt
clearing logic to reduce repetition.
Fixes#1033
In the BaseCPU object _uncached_interrupt_response_ports is a class
variable, not an instance variable. #1004 changed the explicit
self._uncached_interrupt_response_ports to use extend. This caused the
list of ports to be extended *for all cores*, which caused problems when
using a system with more than 1 core.
This reverts the `extend` part of the change, but keeps the rest.
Change-Id: I6dc7d6da6763048d82960229d34933a3a2ac36e0
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This bumps the docker image used to build GPU applications for input to
GPUFS simulations from ROCm 5.4.2 to ROCm 6.0.2 and Ubuntu from 20.04 to
22.04. This matches the versions in gem5-resources#29 .
Several notes were added to the Dockerfile to describe where the RUN
commands come from. A README.md is also added to clarify that this is
not a disk image for GPUFS and is only used to build applications.
Change-Id: I9ada99e2ed1854cb7adb76f2a1fa662bab398f86
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
There was some inconsistency in the GitHub Workflow files on using
'ubuntu-latest' (which gets the latest Ubuntu version) or
'ubuntu-22.04'. To keep things consistent 'ubuntu-latest' is now used in
all cases. This also saves us updating workloads upon release of a new
Ubuntu version.
gem5.fast does not currently build if the GPU model is built. This fixes
the array-bounds warnings allowing gem5.fast to build again.
Change-Id: I463c2847c3ecfd2257a70418fa247090b0493f9b
v3.0.0 of pre-commit/action caused a deprecation warning in actions.
v3.0.1 was released to deal with this.
Change-Id: Ib5654e465565ad4356754ac097983aec4166b98f
We only test the latest LTS Ubuntu release with min-deps. With 24.04, we
no longer require the 22.04 min dependencies image.
Change-Id: I4b3d668c1f9d10c2b6071848e6daada6c763b5e7
This change ensures all our tests run on our most recent supported LTS
release of Ubuntu.
In the case of compiler tests we still test 22.04 all-dep but test 24.04
all-dep and min-dep (i.e., we drop 22.04 min-dep as it's somewhat
redundant).
Change-Id: I63666d1017594b496523a48e5112a8994f57885f
Speciftying a DevContainer in gem5 allows for users to quickly create an
environment in which they can develop, build, and run gem5. The
".devcontainer/devcontainer.json" file specifies the properties of the
container. In this commit they are as follows:
1. The Docker image ghcr.io/gem5/devcontainer. This is built from
"util/dockerfiles/devcontainer". This Dockerfile provides all
dependencies and a pre-built gem5 binary from the current main branch
(added to "/usr/local/bin"). In order to support this Docker container
on different platforms we use the Docker multi-platform feature. As
such, this must be built using `docker buildx bake devcontainer --push`
which reads the `docker-bake.hcl file for the specification of the
multi-platform image.
2. Visual Studio extensions. This is a list of Visual Studio Code
extensions useful when developing gem5. They are automatically added the
Visual Studio dev container.
3. Features. Features are enhancemets that can be added to a
DevContainer. Normally they are libraries and other commonly used tools
to be included in the Container. As we have our dependencies specified
in the Dockerfile here we select one to enable Docker inside the
container, one to enable the Github CLI, one to improve Linting, and
finally one to enable the vscode CLI.
4. The On Create Command : This command allows us to specify commands to
be run after the DevContainer is created. In this case we execute
".devcontainer/on-create.sh" which, right now, refreshes the git index
and installed the pre-commit checks.
Fix issue #1004. When enabling SMT with the O3 cpu, only the first
interrupts object was getting initialized properly. This patch
initializes all interrupts objects, one per SMT thread.
Change-Id: I300782b645bd8ea3ef2497278fb73125ab4bf495
This commit adds more detailed instruction types for RISC-V Vector.
Concretely, it substitutes VectorIntegerArith, VectorFloatArith,
VectorIntegerReduce and VectorFloatReduce with more specific types
related to the operation that each instruction (e.g., VectorIntegerAdd
or VectorIntegerMult).
Additionaly, fixes two RISC-V instruction types (VectorXXX) that were
used in ARM SVE, placing the proper SimdXXX ones.
Change-Id: I31774fa6a7cd249abfffec68d11d3d77f08ad70b
CC @adriaarmejach
Add a generic cache template to construct internal storage structures.
Also add some example use cases by converting the prefetcher tables to
use this new library.
AssociativeSet can reuse most of the generic cache library code with the
addition of a secure bit. This reduces duplicated code.
Change-Id: I008ef79b0dd5f95418a3fb79396aeb0a6c601784
The tagged entry can be derived from the generic cache entry and add the secure
flag that it needs. This reduces code duplication.
Change-Id: I7ff0bddc40604a8a789036a6300eabda40339a0f
The DCPT table is better built using the generic cache library since we do not
need the secure bit.
Change-Id: I8a4a8d3dab7fbc3bbc816107492978ac7f3f5934
The frequency table is better built using the generic cache library instead of the
AssociativeSet since the secure bit is not needed for this structure.
Change-Id: Ie3b6442235daec7b350c608ad1380bed58f5ccf4
Add a generic cache library modeled after AssociativeSet that can be used for
constructing internal caching structures.
Change-Id: I1767309ed01f52672b32810636a09142ff23242f
Now that we are able to provide a view of the cache hierarchy from
the python world, we can start generating DTB entries for caches
and more specifically to properly fill the next-level-cache and
cache-level properties
Change-Id: Iba9ea08fe605f77a353c9e64d62b04b80478b4e2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
One of things we miss in gem5 is the capability to neatly compose the
cache hierarchy of CPUs and clusters of CPUs. The BaseCPU
addPrivateSplitL1Caches and addTwoLevelCacheHierarchy APIs have
historically been used to bind cache levels together.
These APIs have been superseeded by the introduction of the Cache
hierarchy abstraction in the standard library. The standard library
makes it cleaner for a user to quickly instantiate a hierarchy of caches
with few lines of code. While this removes a lot of complexity for a
user, the Hierarchy objects still have little information about their
internal topology.
To address this problem, this patch adds a tree data structure to the
AbstractCacheHierarchy class, where every node of the tree represent
a cache in the hierarchy. In this way we will expose APIs for traversing
and querying the tree.
For example a 2 CPUs system with private L1, private L2 and shared L3
will contain the following tree:
[root]
|
[L3]
/\
/ \
[L2] [L2]
| |
[L1] [L1]
Change-Id: I78fe6ad094f0938ff9bed191fb10b9e841418692
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf
The GPUDynInst for sending memory requests through the CUs data port
is required but only used for DPRINTFs. Relax this constraint so that
the methods can be reused for requests such as probes generated by the
GPU device.
Change-Id: I16094e400968225596370b684d6471580888d98a
One of things we miss in gem5 is the capability to neatly compose the
cache hierarchy of CPUs and clusters of CPUs. The BaseCPU
addPrivateSplitL1Caches and addTwoLevelCacheHierarchy APIs have
historically been used to bind cache levels together.
These APIs have been superseded by the introduction of the Cache
hierarchy abstraction in the standard library. The standard library
makes it cleaner for a user to quickly instantiate a hierarchy of caches
with few lines of code. While this removes a lot of complexity for a
user, the Hierarchy objects still have little information about their
internal topology.
To address this problem, this patch adds a tree data structure to the
AbstractCacheHierarchy class, where every node of the tree represent
a cache in the hierarchy. In this way we will expose APIs for traversing
and querying the tree.
For example a 2 CPUs system with private L1, private L2 and shared L3
will contain the following tree:
[root]
|
[L3]
/\
/ \
[L2] [L2]
| |
[L1] [L1]
This PR is offloading some of the partitioning logic to the partitioning
manager, effectively changing
the partitioning interface. Rather than always relying on the
PartitionFieldExtention data structure to
convey partition IDs, we make it implementation defined by introducing
the partitioning manager abstraction.
We want user to be able to extract the partitionId more flexibly and
this requires using a SimObject.
Users can extend the PartitioningManager, overriding the
readPacketPartitionId, therefore providing their
own mean of injecting/extracting partitioning data from a packet
M5Ops C / C++ functions partially use 64 bit arguments and return value.
In general, 64 bit arguments and return values are possible for 32 bit
RISC-V systems as well, since the arguments and the return value is
split into two registers. However, at the moment, this does not work for
32 bit RISC-V systems on the simulator side, since there is a one to one
mapping between argument registers and m5op function parameters.
To solve this problem, the get() function of the RISC-V reg_abi is
updated. It now will merge two registers if there is a 64 bit argument.
For this, the function code has to be passed to the get() function. The
default value of this function code is set to 0xF00, since 0x00 is
already used for M5_ARM. The parameter list of other get() functions for
argument return is also extended by this function code parameter with
the keyword [[maybe_unused]].
To enable a return value of size 64 bit, a0 is assigned with the lower
32 bit and a1 with the higher 32 bit.
Related Issue: https://github.com/gem5/gem5/issues/881
The new ISA-agnostic interface is the PartitionManager.
We therefore make the PartitionFieldExtention private to the
Arm implementation of memory partitioning (FEAT_MPAM)
Any other partitioning implementation should override the
PartitionManager::readPacketPartitionID to provide a mean
for extracting partitioning data (partition_id) from the
incoming Packet.
With this commit we also define an MPAM MSC which is
supposed to be the partitioning manager for the
Memory System Component
Change-Id: I6959ace0c0cbca549dcc1aacd53dff223b5fe328
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Add alternative implementation to far atomics when the flag alloc_on_commit
is false. The implementation fetches the data, performs the atomic and
writes back the cache line to main memory.
Co-authored-by: Fabian Schätzle <f.schaetzle@fz-juelich.de>
Change-Id: I8797fbc68448e1866a292f4afeedd3613113dddd
This SimObject can be used to quickly test the statistics are
functioning correctly. The SimObject schedules a single event which sets
the statistics to values dependent on the SimObject params.
With this commit the "Scalar" stats have a StatTester subclass that can
be used for testing. More can be added as required.
Tests are included to check our Scalar SimStat functionality.
This has the SimObjects defined in "src/test_objects" only be compiled
into the gem5 binary if the Kconfig 'USE_TEST_OBJECTS" == 'y'. This
happens in two cases:
1. When 'ALL/gem5' is compiled via "build_opts".
2. When tests are run via "./tests/main.py".
Change-Id: I2330008fd7c7900de5f4de142b8ac89ef4e351ce
This SimObject can be used to quickly test the statistics are
functioning correctly. The SimObject schedules a single event which sets
the statistics to values dependent on the SimObject params.
With this commit the "Scalar" stats have a StatTester subclass that can
be used for testing. More can be added as required.
Tests are included to check our Scalar SimStat functionality.
Change-Id: I78fa5d9a0c3fc7115bd6c6d3410a5436aaa47f55
When the `statistics::nozero` flag is set gem5 does not output that stat
if its value is zero. This was not the case for SimStats which output in
this case. This patch correct this behavior.
To make Atomic transaction recursive and enable 2-level config,
remove AtomicReturn_NoWait and other level-dependent code
GitHub Issue: https://github.com/gem5/gem5/issues/882
Change-Id: Iac468cdb8a3b5914c8f05c5cedde866ce85f359a
This will just call the _m5.range.isSubset method
Change-Id: If747819a008a8ed20796b4efd42a42e5c3a8d7d9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is a followed up fix to #791 mem-ruby: Fix possible dirty line loss
in CHI when ReadShared hit on UD line.
UD_RU line may have stale data since the upstream could have updated the
line, so its local cache line data is treated as invalid
(dataValid=false). But when the line is evicted, it must be written back
to downstream because the upstream may have the line in clean state
(UC). This change fixes it by performing copy back the UD_RU line while
keeping its dataValid as false.
Example error case:
- L3 was in UD_RSC and being evicted without back-invalidation. LLC (HN)
was in RU state.
- Because there's still upstream sharer, L3 sends WriteClean.
- Because the data state was unique and dirty, L3 sends CBWrData_UD_PD.
- LLC becomes UD_RU.
- When the line is evicted from LLC (LocalHN_Eviction), the line is just
dropped, causing the loss of the dirty copy
Co-authored-by: Minje Jun <minje.jun@samsung.com>
The `reset_vect` has exist for a long time and `reset_vect` will not
effect if the user gonna to use customized reset_vect. The CL added the
`auto_reset_vect` to let the config determine the `reset_vect` from
workload entry point or user-specified
Ref: https://gem5-review.googlesource.com/c/public/gem5/+/42053
Change-Id: I928c0dc42aaa85ceabf8d75f9654486496e0ffee
This change sets the `release` of the ARM board at the config file
instead of overriding the release on the ArmBoard. This change partially
solves issue 932 as the system taking and restoring the checkpoint is
consistent across KVM and timing CPUs respectively.
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
Fix#988. Rewrite statxFunc and copyOutStatxBuf to use platform-agnostic
stat system call, not Linux-specific statx system call.
Change-Id: I3d17e14684e9cd77cdbfd0141b93c3bcbd27dbeb
When running `scons build/ALL/gem5.opt --with-ubsan`, with GCC, the
following error was returned:
```
[ CXX] src/base/loader/image_file_data.cc -> ALL/base/loader/image_file_data.o
In file included from /usr/include/string.h:535,
from /usr/include/c++/11/cstring:42,
from src/base/cprintf_formats.hh:33,
from src/base/cprintf.hh:38,
from src/base/logging.hh:49,
from src/base/loader/image_file_data.cc:40:
In function ‘char* strcpy(char*, const char*)’,
inlined from ‘int gem5::loader::doGzipLoad(int)’ at src/base/loader/image_file_data.cc:70:11,
inlined from ‘gem5::loader::ImageFileData::ImageFileData(const string&)’ atsrc/base/loader/image_file_data.cc:116:24:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:79:33: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ offset [0, 19] is out of the bounds [0, 0] [-Werror=array-bounds]
79 | return __builtin___strcpy_chk (__dest, __src, __glibc_objsize (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
scons: *** [build/ALL/base/loader/image_file_data.o] Error 1
scons: building terminated because of errors.
```
As can be seen from this Daily test log:
https://github.com/gem5/gem5/actions/runs/8478384881, checkout@v2 and
{upload/download}-artifact@v3 was causing warnings to be thrown. This
fix upgrades all instances of these actions to the latest version (in
both cases, v4).
Currently, the citation string has a Unicode character. This works well
in gem5, but it breaks the gem5+SST simulation [1]. This change modifies
the letter "u" with umlaut to use TeX's escape sequence for this letter
instead of using the UTF-8 character.
[1] https://github.com/gem5/gem5/issues/982
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
A constructor is added to GuestAddr as suggested in the pull request
feedback. This allows a cast conversion from uint64_t GuestAddr. Hence,
the casting from uint64_t to GuestAddr by reinterpret_cast is removed
(was added in a previous commit).
using namespace pseudo_inst is also removed as requested.
Comments are added to GuestAddr.
Change-Id: Ib76de2bff285f4e53ad03361969c27f7bb2dfe9e
AMD's MI100 introduced a new register file called accumulation registers
for the matrix cores. In MI200 these were recombined into the same
register file according to the documentation. The accumulation register
file is the same size as the architectural register file, hence the size
is doubled.
The ISA spec does not explicitly state the register selector values,
however it does say that the accumulation offset from the kernel
dispatch packet should be added to the architecture register file
selector number when an instruction sets the ACC bit. Therefore we can
infer that the value must simply be an extension beyond the
architectural VGPRs.
This fixes errors of the form "invalid register selector: 512" (or
higher value). This was tested with the Learn the Basics tutorial
example on pytorch.org
Change-Id: I48ced1532fc166d2f5032fe21fbeba70ac77f258
v3 was causing a 'Node.js 16 actions are deprecated' error.
Note: download-artifact@v4 must be used with upload-artifact@v4 and
vice-versa.
Change-Id: Icb8ab6d27aed4557be95ce31dd89d4655010968e
This caused a 'Node.js 16 actions are deprecated;' error.
With this commit all our checkout actions are set to '@v4'.
Change-Id: I0f931bf7967f49ee44b7bf1d6a56e19f017fb948
This address, 0x0, is most likely a wrong address to call m5 ops.
The warning will catch the problem where m5op_addr is not initialized
properly.
Change-Id: I442b4806191ae6f5c137bc947f2a269684c599dd
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
The existing implementation of vfmv instruction did not type cast the
first element of the source vector, which caused the "freg" to interpret
the result as a NaN.
With the type cast to f32, the value is correctly recognized as float
and sign extended to be stored in the fd register.
Git issue: https://github.com/gem5/gem5/issues/827
Change-Id: Ibe9873910827594c0ec11cb51ac0438428c3b54e
---------
Co-authored-by: Debjyoti B <bhatta53@imec.be>
Co-authored-by: Tommaso Marinelli <tommarin@ucm.es>
As pointed out here [1], the expected M5OP_ADDR for arm64 arch
is 0x10010000. This change reflects that.
[1] https://github.com/gem5/gem5/pull/725
Change-Id: I7e72f5ea20d4aacf3115a485ba7cd664d33d037e
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Remove the following files:
* src/dev/virtio/rng 2.cc
* src/dev/virtio/rng 2.hh
Which were a copy of rng.hh and rng.cc. Probably added to the repository
by accident. They were not compiled by scons
Change-Id: I9d1da19cc243c513ab7af887b1b6260d8e361b57
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Now that we are able to provide a view of the cache hierarchy from
the python world, we can start generating DTB entries for caches
and more specifically to properly fill the next-level-cache and
cache-level properties
Change-Id: Iba9ea08fe605f77a353c9e64d62b04b80478b4e2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
One of things we miss in gem5 is the capability to neatly compose the
cache hierarchy of CPUs and clusters of CPUs. The BaseCPU
addPrivateSplitL1Caches and addTwoLevelCacheHierarchy APIs have
historically been used to bind cache levels together.
These APIs have been superseeded by the introduction of the Cache
hierarchy abstraction in the standard library. The standard library
makes it cleaner for a user to quickly instantiate a hierarchy of caches
with few lines of code. While this removes a lot of complexity for a
user, the Hierarchy objects still have little information about their
internal topology.
To address this problem, this patch adds a tree data structure to the
AbstractCacheHierarchy class, where every node of the tree represent
a cache in the hierarchy. In this way we will expose APIs for traversing
and querying the tree.
For example a 2 CPUs system with private L1, private L2 and shared L3
will contain the following tree:
[root]
|
[L3]
/\
/ \
[L2] [L2]
| |
[L1] [L1]
Change-Id: I78fe6ad094f0938ff9bed191fb10b9e841418692
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is a first step towards offloading some of the partitioning
logic to the partitioning manager. We start with this patch
by replacing the static readPacketPartitionId into a virtual
method owned by the manager.
The issue with readPacketPartitionId as of now is that it relies
on the fixed PartitionFieldExtention.
We want user to be able to extract the partitionId more flexibly
and this requires using a SimObject
Change-Id: I3bd2e81e2a97c55fc83548956fc59f422c8049a6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit updates cpu by removing VectorXXX types and updates
FUs according to the newer SimdXXX ones. This is part of the
homogenization of RISCV Vector instruction types, which moved
from VectorXXX to SimdXXX.
Change-Id: I84baccd099b73a11cf26dd714487a9f272671d3d
This commit adds some more detailed instruction types for RISC-V
Vector. Concretely, it substitutes VectorIntegerArith,
VectorFloatArith, VectorIntegerReduce and VectorFloatReduce with
more specific types related to the operation that each instruction
performs, being consistent with SimdXXX ones.
Change-Id: Iaffa74871ccc56d8c3627e1f1e111b9bc9e864af
This commit fixes two RISC-V instruction types (VectorXXX) that
were used in ARM SVE to the proper SimdXXX ones.
Change-Id: Id632926a89ae2395234f3cf34adeab63844bdd57
This PR fixes#948 in which running KVM CPUs through the updated gem5
interface in SE mode causes an immediate crash.
To fix this, I added a check to set_se_binary_workload that checks if
any of the cores are KVM, and if so, sets a couple of knobs for the
board and process that are required to make KVM work. The depecated
se.py script, which sets these knobs, is able to run KVM in SE mode just
fine, so doing the same here fixed the bug.
When running `scons build/ALL/gem5.opt --with-ubsan`, with GCC, the
following error was returned:
```
[ CXX] src/base/loader/image_file_data.cc -> ALL/base/loader/image_file_data.o
In file included from /usr/include/string.h:535,
from /usr/include/c++/11/cstring:42,
from src/base/cprintf_formats.hh:33,
from src/base/cprintf.hh:38,
from src/base/logging.hh:49,
from src/base/loader/image_file_data.cc:40:
In function ‘char* strcpy(char*, const char*)’,
inlined from ‘int gem5::loader::doGzipLoad(int)’ at src/base/loader/image_file_data.cc:70:11,
inlined from ‘gem5::loader::ImageFileData::ImageFileData(const string&)’ atsrc/base/loader/image_file_data.cc:116:24:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:79:33: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ offset [0, 19] is out of the bounds [0, 0] [-Werror=array-bounds]
79 | return __builtin___strcpy_chk (__dest, __src, __glibc_objsize (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
scons: *** [build/ALL/base/loader/image_file_data.o] Error 1
scons: building terminated because of errors.
```
I do not know the exact issue but using strcpy in this way (i.e.
`strcpy(char_pointer + offset, string)`) appears to trigger this error
with the undefined behavior sanitizer. The fix in this patch replaces
this with `strcat`.
Change-Id: I1a0c50c9022adc841e175aad0fe2247bfcb29d71
This commit adds support for vector unit-stride segment store operations
for RISC-V (vssegXeXX). This implementation is based in two types of
microops:
- VsSegIntrlv microops that properly interleave source registers into
structs.
- VsSeg microops that store data in memory as contiguous structs of
several fields.
Change-Id: Id80dd4e781743a60eb76c18b6a28061f8e9f723d
Gem5 issue: https://github.com/gem5/gem5/issues/382
Implement several features new in ROCm 6.0 and features required for
future devices. Includes the following:
- Support for multiple command processors
- Improve handling of unknown register addresses
- Use AddrRange for MMIO address regions
- Handle GART writes through SDMA copy
- Implement PCIe indirect reads and writes
- Improve PM4 write to check dword count
- Implement common MI300X instruction
This update modifies the test configuration to specify the versions of
resources used, rather than automatically using the latest versions.
Previously, if a resource was updated for a change, it could potentially
cause tests to fail if those tests were incompatible with the new
version of the resource.
Now, with this change, tests are tied to specific versions of resources,
ensuring that any updates to resources will require corresponding
updates to the tests to maintain compatibility.
Change-Id: I9633b1749f6c6c82af6aa6697b7e7656020f62fa
Currently gem5 assumes that there is only one command processor (CP)
which contains the PM4 packet processor. Some GPU devices have multiple
CPs which the driver tests individually during POST if they are used or
not. Therefore, these additional CPs need to be supported.
This commit allows for multiple PM4 packet processors which represent
multiple CPs. Each of these processors will have its own independent
MMIO address range. To more easily support ranges, the MMIO addresses
now use AddrRange to index a PM4 packet processor instead of the
hard-coded constexpr MMIO start and size pairs.
By default only one PM4 packet processor is created, meaning the
functionality of the simulation is unchanged for devices currently
supported in gem5.
Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c
PCIe can read/write to any 32-bit address using the PCI index/index2
registers as an address and then reading/writing the corresponding
data/data2 register.
This commit adds this functionality and removes one magic value being
written to support GPU POST. This feature is disabled for Vega10 which
relies on an MMIO trace for too many values to implement in the MMIO
interface.
Change-Id: Iacfdd1294a7652fc3e60304b57df536d318c847b
The SRBM write packets where previously not required. This commit
implements SRBM writes to set a register by using the new setRegVal
interface. SRBM writes seem to be used for SRIOV enabled devices.
Change-Id: I202653d339e882e8de59d69a995f65332b2dfb8c
The top level AMDGPUDevice currently reads/writes all unknown registers
to/from a map containing the previously written value. This is intended
as a way to handle registers that are not part of the model but the
driver requires for functionality. Since this is at the top level, it
can mask changes to register values which do not go through the same
interface. For example, reading an MMIO, changing via PM4 queue, and
reading again returns the stale cached value.
This commit removes the usage of the regs map in AMDGPUDevice,
implements some important MMIOs that were previously handled by it, and
moves the unknown register handling to the NBIO aperture only. To reduce
the number of additional MMIOs to implement, the display manager in
vega10 is now disabled.
Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2
The SDMA engine can potentially be used to write to the GART address
range. Since gem5 has a shadow copy of the GART table to avoid sending
functional reads to device memory, the GART table must be updated when
copying to the GART range.
This changeset adds a check in the VM for GART range and implements the
SDMA copy packet writing to the GART range. A fatal is added to write
and ptePde, which are the only other two ways to write to memory, as
using these packets to update the GART table has not been observed.
Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd
The write data packet can write multiple dwords but currently always
assumes there is one dword, which can cause some write data to be
missed. This case is not common, but the number of dwords is implicitly
defined in the PM4 header.
This changeset passes the PM4 header to write data so that the correct
number of dwords can be determined. For now we assume no page crossing
when writing multiple dwords as the driver should be checking for that.
Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7
The ROCm 6.0 driver adds a node_id field to interrupts which must match
before passing on the interrupt to be cleared by the cookie from gem5's
interrupt handler implementation. Add this field and enable for gfx942.
The usage of the field can be seen in event_interrupt_isr_v9_4_3 at
https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/
gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449
Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922
The main decoder for GPU instructions looks at the first 9 bits of a
dword to determine either the instruction or a subDecode table with more
information for specific instructions types. For flat instructions the
first 9 bits currently consist of 6 fixed encoding bits, a reserved bit,
and the first two bits of the opcode. Hence to support all opcodes there
are four indirections to the flat subDecode table. In MI300 the reserved
bit is part of a field to determine memory scope and therefore may be
non-zero.
This commit adds four addition calls to the subDecode table for the
cases where the scope bit is 1. See page 468 (PDF page 478) below:
https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/
instruction-set-architectures/
amd-instinct-mi300-cdna3-instruction-set-architecture.pdf
Change-Id: Ic3c786f0ca00a758cbe87f42c5e3470576f73a32
gpu-compute: Add support for skipping GPU kernels
This commit adds two new command-line options:
--skip-until-gpu-kernel N
Skips (non-blit) GPU kernels until the target kernel is reached.
Execution continues normally from there. Blit kernels are not skipped
because they are responsible for copying the kernel code and metadata
for the non-blit kernels. Note that skipping kernels can impact
correctness; this feature is only useful if the kernel of interest has
no data-dependent behavior, or its data-dependent behavior is not based
on data generated by the skipped kernels.
--exit-after-gpu-kernel N
Ends the simulation after completing (non-blit) GPU kernel N.
This commit also renames two existing command-line options:
--debug-at-gpu-kernel -> --debug-at-gpu-task
--exit-at-gpu-kernel -> --exit-at-gpu-task
These were renamed because they count GPU tasks, which include both
kernels launched by the application as well as blit kernels.
Change-Id: If250b3fd2db05c1222e369e9e3f779c4422074bc
MI200 adds support for four FP32 packed math instructions. These are
VOP3P instructions which have a negative input modifier field. The
description made it unclear if these were used for F32 packed math
however the assembly of some Tensile kernels are using these modifiers
therefore adding support for them. Tested with PyTorch nn.Dropout kernel
which is using negative modifiers.
Change-Id: I568a18c084f93dd2a88439d8f451cf28a51dfe79
The datatype is U32 but should be F32. This is causing an implicit cast
leading to incorrect results. This fixes nn.Dropout in PyTorch.
Change-Id: I546aa917fde1fd6bc832d9d0fa9ffe66505e87dd
This change fixes two issues:
1) The --cacheline_size option was setting the system cache line size
but not the Ruby cache line size, and the mismatch was causing assertion
failures.
2) The submitDispatchPkt() function accesses the kernel object in
chunks, with the chunk size equal to the cache line size. For cache line
sizes >64B (e.g. 128B), the kernel object is not guaranteed to be
aligned to a cache line and it was possible for a chunk to be partially
contained in two separate device memories, causing the memory access to
fail.
Change-Id: I8e45146901943e9c2750d32162c0f35c851e09e1
Co-authored-by: Michael Boyer <Michael.Boyer@amd.com>
From [1] The PrivateL1PrivateL2Cache hierarchy has been amended with an
MMUCache, which is basically a small cache in front of the page table
walker.
Not every ISA makes use of it.
Arm for example already implements caching of page table walks, via the
partial_levels parameter in the ArmTLB.
With this patch we define a new module which explicitly makes use of the
WalkCache. Configurations that do not require
another cache in the first level of the memsys (for the ptw) can use the
PrivateL1PrivateL2CacheHierarchy
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/49364
In the RISC-V unprivileged spec[1], the misaligned load/store support is
depend on the EEI.
In the RISC-V privileged spec ver1.12[2], the PMA specify wether the
misaligned access is support for each data width and the memory region.
In the [3] of `mcause` spec, we cloud directly raise misalign exception
if there is no memory region misalignment support. If the part of memory
region support misaligned-access, we need to translate the `vaddr` to
`paddr` first then check the `paddr` later. The page-fault or
access-fault is rose before misalign-fault.
The benefit of moving check_alignment option from ISA option to PMA
option is we can specify the part region of memory support misalign
load/store.
MMU will check alignment with virtual addresss if there is no misaligned
memory region specified. If there are some misaligned memory region
supported, translate address first and check alignment at final.
[1]
https://github.com/riscv/riscv-isa-manual/blob/main/src/rv32.adoc#base-instruction-formats
[2]
https://github.com/riscv/riscv-isa-manual/blob/main/src/machine.adoc#physical-memory-attributes
[3]
https://github.com/riscv/riscv-isa-manual/blob/main/src/machine.adoc#machine-cause-register-mcause
As X86 and RISCV are relying on a Table Walker cache, we
change their stdlib configs to use the newly defined
PrivateL1PrivateL2WalkCacheHierarchy
Change-Id: I63c3f70a9daa3b2c7a8306e51af8065bf1bea92b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
From [1] The PrivateL1PrivateL2Cache hierarchy has been amended
with an MMUCache, which is basically a small cache in front
of the page table walker. Not every ISA makes use of it.
Arm for example already implements caching of page table
walks, via the partial_levels parameter in the ArmTLB.
With this patch we define a new module which explicitly makes
use of the WalkCache. Configurations that do not require
another cache in the first level of the memsys (for the ptw)
can use the PrivateL1PrivateL2CacheHierarchy
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/49364
Change-Id: I17f7e68940ee947ca5b30e6ab3a01dafeed0f338
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit adds two additional whitespaces in the definition of
GuestAddr as well as in the operator << overload.
Change-Id: Ifb371a09b378fcf4862a768f113b5963b45bd167
There were two errors causing the Weekly tests to fail. Each has a patch
in this PR:
1. Fixed incorrect version for the `artifact-downloader` (v4 instead of
v3).
2. Fixed incorrect use of `working-directory` which use of
`build/VEGA_X86/gem5.opt` to fail (not accessible in set
`working-directory`. The default `${github.workspace}` is sufficient.
Only the lower 32 bit of return values of pseudo instructions are
stored (in a0). Therefore, the upper 32 bit are stored in a1 to
enable a correct return value.
Change-Id: Idf33c325033281fc191a9285eb5d34fd4965cde9
With the previously introduced struct wrapper GuestAddr, the asm
tests fail. This patch substitutes implements SyscallABI32 similar
to RegABI32, i.e., as a struct based on GenericSyscallABI32.
Furthermore, a get function for arguments is implemented for wide
arguments. It returns the lower 32 bits of a register.
Change-Id: I233a67a5d5c15ab0d019a63bc57f1225288e33cc
In this patch, Addr is subtituted by a struct wrapper (uint64_t) in the
pseudo instruction functions. This enables a correct argument handling
in systems where pointer size != 64 bit.
Change-Id: Ie84b43b4ab8e6c0d38c7b6b16e19fc043110681b
"build/VEGA_X86/gem5.opt" is not available in directory "hip". `${
github.workspace}` is default should be run from there. This patch fixes
this.
Change-Id: I99875270c77dde92d3ec2ae0a07760905eaf903e
At the moment the SMMU is not handling translation errors gracefully the
way it is described by the SMMUv3 spec: whenever a translation fault
arises, simulation aborts as a whole. With this PR we add minimal
support for
translation fault handling, which means:
1) Not aborting simulation, but rather:
2) Writing an event entry to the SMMU_EVENTQ (event queue)
3) Signaling the PE an error arose and there is an event entry to be
consumed. The signaling is achieved
with the addition of the eventq SPI. Using an MSI is also possible
though it is currently disabled by the SMMU_IDR0.MSI being set to zero.
The PR is addressing issues reported by
https://github.com/orgs/gem5/discussions/898
The SMMU_IRQ_CTRL had been made optionally writeable by a
prior patch [1] even if interrupts were not supported in
the SMMUv3 model.
As we are partially enabling IRQ support, we remove this option
and we make the SMMU_IRQ_CTRL always writeable
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/38555
Change-Id: Ie1f9458d583a5d8bcbe450c3e88bda6b3c53cf10
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
See https://github.com/orgs/gem5/discussions/898
The SMMUv3 Event Queue is basically unused at the moment. Whenever a
transaction fails we actually abort simulation. The sendEvent method
could be used to actually report the failure to the driver but it is
lacking interrupt support to notify the PE there is an event to handle.
The SMMUv3 spec allows both wired and MSI interrupts to be used.
We add the eventq_irq SPI param to the SMMU object and we draft an
initial sendInterrupt utility that makes use of it whenever it is
needed.
Change-Id: I6d103919ca8bf53794ae4bc922cbdc7156adf37a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Rely on the architected solution instead of aborting simulation.
This means handling writes to the Event queue to signal managing
software there was a fault in the SMMU
Change-Id: I7b69ca77021732c6059bd6b837ae722da71350ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The struct fields of the SMMUEvent were not matching the SMMUv3 specs.
This was "not an issue" as events have been implicitly disabled until
now (every translation error was aborting simulation)
With generateEvent we automatically construct a SMMU event from
a translation result.
Change-Id: Iba6a08d551c0a99bb58c4118992f1d2b683f62cf
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
A faulting translation should return additional information
(other than the fault type). This will be used by future
patches to properly populate the SMMU event record of the
event queue
As we currenlty support two faults only:
1) F_TRANSLATION
2) F_PERMISSION
We add to TranslResult the relevant fault information only:
type, class, stage and ipa
Change-Id: I0a81d5fc202e1b6135cecdcd6dfd2239c2f1ba7e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reading the Context Descriptor (CD) might require a stage2
translation. At the moment doReadCD does not check for the
return value of the translateStage2.
This means that any stage2 fault will be silently discarded
and an invalid address will be used/returned.
By returning a translation result we make sure any error
happening in the second stage of translation will be properly
flagged
Change-Id: I2ecd43f7e23080bf8222bc3addfabbd027ee8feb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We don't check the fault type directly. This will improve
readability once the TranslResult class will be augmented
with extra fields
Change-Id: I5acafaabf098d6ee79e1f0c384499cc043a75a9d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit adds support for vector unit-stride segment load operations
for RISC-V (vlseg<NF>e<X>). This implementation is based in two types of
microops:
- VlSeg microops that load data as it is organized in memory in structs
of several fields.
- VectorDeIntrlv microops that properly deinterleave structs into
destination registers.
Gem5 issue: https://github.com/gem5/gem5/issues/382
With the introduction of multi-ISA gem5, we don't store the TARGET_ISA
anymore as a string in the root section of the checkpoint [1]. There is
therefore no way at the moment to asses the ISA of a CPU/ThreadContext.
This is a problem when it comes to checkpoint updates which are ISA
specific.
By explicitly serializing the ISA as a string under the cpu.isa section
we avoid this problem and we let cpt_upgraders be aware of the ISA in
use.
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/48884
A set of cpt_upgraders was patching old checkpoints regardless
of the ISA in use. Thanks to the previous patch, we can now
retrieve the ISA of the CPU from the isa section.
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: Ia110068c06453796cefac028ee13f21667e5371a
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
With the introduction of multi-ISA gem5, we don't store the TARGET_ISA
anymore as a string in the root section of the checkpoint [1]. There is
therefore no way at the moment to asses the ISA of a CPU/ThreadContext.
This is a problem when it comes to checkpoint updates which are ISA
specific.
By explicitly serializing the ISA as a string under the cpu.isa section
we avoid this problem and we let cpt_upgraders be aware of the ISA in
use.
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/48884
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1e75230cbc370cab84f4a54141b1e425af2dbfac
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This commit adds statistics showing completed page walks for 4KB and 2MB
pages. This will add to stats.txt the variables num_4kb_walks,
num_2mb_walks and the corresponding values. This is done based on the
level of page table walk traversed specific to Sv39 Virtual Memory
System.
* Add Cache partitioning policies to manage and enforce cache
partitioning:
* Add Way partition policy
* Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
Partition IDs for cache partitioning and monitoring
* Modify Cache SimObjects to store partition policies
* Modify Cache block eviction logic to use new partitioning policies
Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>
Change-Id: Ib35153a8b46803c22a433926270d82e5e19ce544
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This pull request contains a set of small patches which fix some bugs in
the gem5 prefetchers, and aligns out-of-the box prefetcher performance
more closely with that which a typical user would expect.
The performance patches have been tested with an out-of-the-box
(untuned) Stride prefetcher configuration against a set of SPEC 2017
SimPoints, and show a small-to-modest IPC uplift across the about half
the benchmarks, with no significant IPC degradation.
The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
This PR is an updated version of PR #564, which was reverted due to Bug
#580. Bug #580 was fixed in PR #871. This PR updates #564 to the latest
state of the develop branch, and should be applied after PR #871.
Adding an error message in case the binary is not compatible with gem5.
This PR is addressing the comments in issue #807.
Change-Id: I66466ed6f657276c13d237fde3b1ec12c20cfe91
Previously merged PR #886 created pic.hart_config, but it was not
initialized properly in lupv_board.py. This issue is causing daily tests
to fail.
Change-Id: I193ff4a3e5ef787eefcf066404e762f024fa6603
---------
Co-authored-by: Yu-Cheng Chang <aucixw45876@gmail.com>
One of the limitations of the RegBank class is that it does not allow
you to pass a non-contiguous set of registers. Its simplest form will
just accept an initializer list of registers and it will store them in
sequence.
A more refined version [1] will optionally accept an offset value to be
passed alongside the register reference. This is not meant to be used by
the register bank to store the register at the provided offset.
It is rather used by the bank to sanity check the register sits exactly
at the provided range.
The way to work around this for a fragemented register space is to
explicitly allocate RAZ/RAO blocks as registers and to pass them to
addRegisters together with the others. (See the SysSecCtrl [2] as an
example)
This makes it a bit tedious to model a register bank with gaps between
its registers. First, the exact number and position of the gaps needs to
be extraced from a spec. These sometimes report only implemented
registers and their offset, and omit to document gaps/reserved space. So
a developer needs to manually add register offset and size to check if
all registers are contiguous. Second, these reserved register blocks
need to be instantiated in the bank adding boilerplate code and
affecting readibility.
For these reasons we add a new registration method, called
addRegisters*At*. It reuses the RegisterAdder class but this time the
offset field is really used to instruct the bank where the register
should be mapped. The method is templated and the template parameter
tells the bank which register type should be used to fill the remaining
space. We make the RegBank the owner of this filler space (registers are
generated internally within addRegistersAt).
[1]: https://github.com/gem5/gem5/blob/stable/src/dev/reg_bank.hh#L106
[2]: https://github.com/gem5/gem5/blob/stable/src/dev/arm/ssc.cc#L48
Change-Id: I614ae6e9eeb40b365ac9b6dd8b75abbfdb9cb687
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
* Add Cache partitioning policies to manage and enforce cache partitioning:
* Add Way partition policy
* Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
Partition IDs for cache partitioning and monitoring
* Modify Cache Tags SimObjects to store partition policies
* Modify Cache Tags block eviction logic to use new partitioning policies
* Add example system and TrafficGen configurations for testing Cache
Partitioning Policies
Change-Id: Ic3fb0f35cf060783fbb9289380721a07e18fad49
Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This change adds a fatal statement to check all params for all
SimObjects have been unproxied before C++ object are created.
The fatal statement notifies the user of a mistake that could
possibly lead to a SimObject to not have its params unproxied.
This mistake could be made by adding a child SimObject with a
name that starts with an underscore.
This PR implements a few changes related to the accumulation offset
which is new in MI200. Previously MI100 contained two vector register
files: the architectural and accumulation register files. These have now
been unified and the architectural register file is twice the size. As a
result of this the dispatch packet set an offset into the unified vector
register file for where the former accumulation registers would go. The
changes are:
- Calculate the accumulation offset from dispatch packet and store in
HSA task.
- Update the accumulation move instructions (v_accvgpr_read/write) to
use it.
- Update the current MFMA instructions to use it.
- Make the MFMA examples more clean.
Initialize x86 process' max stack size to the value given in the process
params, rather than hard-coding it to 8 MB, which made it impossible to
run x86 programs requiring more than 8 MB of stack.
Change-Id: I0b17fe60b016b1e4a82d704ef7ad367974ea6a08
For some simulations with big values for VLEN (e.g. 8k and 16k) there
were more packets created on the fly and, as a consequence, failing the
simulations. The sanity check has been increased in order to solve this
high VLEN cases.
Supervised by [@aarmejach](https://github.com/aarmejach)
Change-Id: I137b0f3113687b3fc9c4154d19ca5e8017e6e992
Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
Adds categorization of bypassed atomics in TCC to the TBE as either
return or no-return, which gets consumed in pa_performAtomic to
determine if atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously missing.
Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
Adds categorization of bypassed atomics in TCC to the TBE as either return
or no-return, which gets consumed in pa_performAtomic to determine if
atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
The original version of `list_changes.py` assumed no more than one
`Change-Id` tag per commit. However, since transitioning to GitHub, the
repository now contains some merge commits containing multiple
`Change-Id`s.
This patch updates `list_changes.py` to support commits with any number
of `Change-Id` tags.
Fix QoS Memory Queue Policies
* Fix assertions in LRG policy to correctly assert requestor and list
validity
* Fix `selectPacket()` in LIFO Queue Policy to correctly return the end
of the `deque` backing store for its packet queue
This commit update the two exiting MFMA instructions to support the
accumulation offset for A, B, and C/D matrix. Additionally uses array
indexed C/D matrix registers to reduce duplicate code. Future MFMA
instructions have up to 16 registers for C/D and this reduces the amount
of code being written.
Change-Id: Ibdc3b6255234a3bab99f115c79e8a0248c800400
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously
missing.
Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
The accum offset is used as an index into the unified VGPR register file
in MI200 and is not the same as a move if accum_offset in the dispatch
packet is non-zero.
Change these instructions to use the stored accum_offset value.
Change-Id: Ib661804f8f5b8392e4c586082c423645f539e641
The accumulation offset is needed for some instructions. In order to
access this value we need to place it somewhere instruction definitions
can access. The most logical place is in the wavefront.
This commit simply copies the value from the HSA task to the wavefront
object.
Change-Id: I44ef62ef32d2421953f096c431dd758e882245b4
Fix#874, in which running se.py with 4GB or more memory (via option
--mem-size=4GB) causes all KVM programs to crash or hang. This occurred
because the m5ops address range (set to 0xFFFF0000-0x100000000)
overlapped with physical memory under such a configuration.
This patch fixes the bug by moving the m5ops address range if phyiscal
memory is >=4GB.
Change-Id: Ic8a004517bc2be2c27860ed314460be749a11dc1
Update the PLIC based on the
[riscv-plic-spec](https://github.com/riscv/riscv-plic-spec) in the PR:
- Support customized PLIC hardID and privilege mode configuration
- Backward compatable with the n_contexts parameter, will generate the
config like {0,M}, {0,S}, {1,M} ...
Change-Id: Ibff736827edb7c97921e01fa27f503574a27a562
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream passes Dirty to the requestor whenever possible
even though it doesn't deallocate the line. This will make the requestor
to SD and the downstream to UD_RSD.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example error condition is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3
doesn't back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock. L3 becomes RUSC and LLC becomes
UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data
The empty constructor prevent zero-initialization working correctly. In
this change we fix the issue by removing the unwanted empty constructor.
We also change the default destructor specification with c++11 style.
Change-Id: I869a93ca5283f811c2aa58406f1478459e0d7022
This commit optimizes the address generation logic in the strided
prefetcher by introducing the following changes
(d is the degree of the prefetcher)
* Evaluate the fixed prefetch_stride only once (and not d-times)
* Replace 2d multiplications (d * prefetch_stride and distance *
prefetch_stride) with additions by updating the new base prefetch
address while looping
Change-Id: I3ec0c642bc9ec7635b0d38308797e99b645304bb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The Stride Prefetcher will skip this number of strides ahead of the
first identified prefetch, then generate `degree` prefetches at
`stride` intervals. A value of zero indicates no skip (i.e. start
prefetching from the next identified prefetch address).
This parameter can be used to increase the timeliness of prefetches by
starting to prefetch far enough ahead of the demand stream to cover
the memory system latency.
[Richard Cooper <richard.cooper@arm.com>:
- Added detail to commit comment and `distance` Param documentation.
- Changed `distance` Param from `Param.Int` to `Param.Unsigned`.
]
Change-Id: I4ce79c72d74445b12acf68e0a54e13966e30041c
Co-authored-by: Richard Cooper <richard.cooper@arm.com>
Signed-off-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
pkt->req->isCacheMaintenance() would not include a check
for clean eviction before notifying the prefetcher,
causing gem5 to crash.
Change-Id: I4a56c7384818c63d6e2263f26645e87cef1243cb
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Update the default prefetch options to achieve out-of-the box
prefetcher performance closer to that which a typical user would
expect. Configurations that set these parameters explicitly will be
unaffected.
The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
Change-Id: Ia6c1803c86e42feef01de40c34d928de50fe0bed
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Prefetch queue entries were being squashed by comparing the address
of each queued prefetch against the block address of the demand
access. Only prefetches that happen to fall on a cache-line block
boundary would be squashed.
This patch converts the prefetch addresses to block addresses before
comparison.
Change-Id: I3a80a1e3d752f925595e33edebf5359d2cc67182
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
* Fix selectPacket() in LIFO Queue Policy to correctly return the end of
the `deque` backing store for its packet queue
* Move selectPacket() implementations for FIFO and LIFO queues into
`q_policy.cc` file
Change-Id: I8c35e5fc83dc380b19f52be14c18b1f414f9e141
According to the RISC-V spec [1]. Any float-point instructions
accumulate FFLAGS register rather than write it to reflect the CSR
behavior.
In the previous implementation. We read the FFLAGS, set the exception
flags, and write the result back to the FFLAGS. This works in the gem5
simple and minor CPU model as they are actually written to `regFile`
after executing the instructions. However, in the gem5 O3 CPU model, it
will record in the `destMiscReg` buffer until the commit stage when
writing to the `miscReg` in the execution stage. The next instruction
will get the old FFLAGS and cause the incorrect result.
The CL introduced the `MISCREG_FFLAGS_EXE` and used the same size of
`miscRegFile` because the `MISCREG_FFLAGS_EXE` and `MISCREG_FFLAGS`
shared the same space. When executing the float-pointing instruction,
any exception flags should be updated via `MISCREG_FFLAGS_EXE` to
accumulate the FFLAGS in `setMiscReg` method. For the MISCREG_FFLAGS, it
should only be called in the CSROp.
[1] Syntactic Dependencies: Appendix A
c80ecada1c/src/mm-eplan.adoc (syntactic-dependencies-rules-9-11)
gem5 issue: https://github.com/gem5/gem5/issues/755
Change-Id: Ib7f13d95b8a921c37766a54a217a5a4b1ef17c6f
Fix#876. The x87 floating-point control word (FCW) was not initialized
at process startup in syscall emulation mode. This resulted in floating
point exceptions in KVM mode when executing x87 floating-point
instructions.
This patch fixes the bug by initializing FCW to its reset value, 0x37F.
Change-Id: Idd1573c6951524ef59466cc5c9f1e640ea7658ae
ArmSigInterruptPin don't send the interrupt to GIC. Instead it sends the
interrupt to the irq specified in Param. When using ArmSigInterruptPin,
we shouldn't ask users to provide "Platform" since it doesn't need it.
To reduce the confusion, this change removes the dependency of Platform
for ArmSigInterruptPin.
Change-Id: I0ee507ed1c08b4fa6d3e384e28732f3acb4f6892
This PR is fixing https://github.com/gem5/gem5/issues/668. It fixes it
for all ISAs other than Arm with the first commit, which is setting the
number of architectural Matrix registers to 0 for those ISA which are
not using them.
It then partly fixes it for Arm as well with the 2nd commit: by removing
RenameMap::numFreeEntries we don't stall renaming unless a matrix
instruction is encountered... This means most binaries will run with SMT
as long as they don't use FEAT_SME instructions. Please note: this is
not simply a SMT fix, it will generally address a shortcoming in the way
we were renaming instructions.
If an Arm binary wants to use SMT with FEAT_SME, the 4th commit will
make sure the lack of physical registers is notified explicitly at the
beginning of simulation, rather than silently blocking renaming
When processing memory Packets for prefetch, the `PrefetchInfo` class
constructor will attempt to copy the `Packet` data. In cases where the
`Packet` under consideration does not contain data, an assertion will be
triggered in the Packet's `getConstPtr` method, causing the simulation
to crash.
This problem was first exposed by Bug #580 when processing an
`UpgradeReq` memory packet.
This patch addresses the problem by suppressing the copying of the
`Packet` data during construction of a `PrefetchInfo` object in cases
where the `Packet` has no data.
This patch addresses Bug #580 [1], which was exposed by PR #564 [2],
subsequently reverted by PR #581 [3]
[1] https://github.com/gem5/gem5/issues/580
[2] https://github.com/gem5/gem5/pull/564
[3] https://github.com/gem5/gem5/pull/581
Change-Id: Ic1e828c0887f4003441b61647440c8e912bf0fbc
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Those are supposed to control trapping for accesses to debug registers
Change-Id: I4a25a379e718ea6d5ea8ae22ac7edbeb452d1836
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
`assert(interruptID >=0)` is always true as `interruptID` is an unsigned
int.
This was causing compilation tests failures in GCC-8 with the following
error:
```sh
src/arch/riscv/interrupts.cc:47:32: error: comparison is always true due to limited range of data type [-Werror=type-limits]
assert(interruptID >= 0);
```
Change-Id: I356be78d7f75ea5d20d34768fb8ece0f746be2fc
This PR adds support for SQC (GPU I-cache) invalidation to the GPU
model. It does this by updating the GPU-VIPER-SQC protocol to support
flushes, the sequencer model to send out invalidates and the gpu compute
model to send invalidates and handle responses. It also adds support for
S_ICACHE_INV, a VEGA ISA instruction that invalidates the entire GPU
I-cache. Additionally, the PR modifies the kernel start behavior to
invalidate the I-cache too. It previously invalidated only the L1
D-cache.
Previously, the S_ICACHE_INV instruction was unimplemented and
simulation panicked if it was encountered. This commit adds support for
executing the instruction by injecting a memory barrier in the scalar
pipeline and invalidating the ICACHE (or SQC)
Change-Id: I0fbd4e53f630a267971a23cea6f17d4fef403d15
In ComputeUnit, a previous commit added a SystemHubEvent event class to
the SQCPort. This was found to be unnecessary during the review process
and is removed in this commit. Similarly, invBuf() which was added in
FetchUnit as part of an earlier commit was found to be redundant. This
commit removes it
Change-Id: I6ee8d344d29e7bfade49fb9549654b71e3c4b96f
Previously, the data caches were invalidated at the start of each
kernel. This commit adds support for invalidating instruction cache at
kernel launch time
Change-Id: I32e50f63fa1442c2514d4dd8f9d7689759f503d3
This commit adds support for injecting a scalar memory barrier in the
GPU. The barrier will primarily be used to invalidate the entire SQC
cache. The commit also invalidates all buffers and decrements related
counters upon completion of the invalidation request
Change-Id: Ib8e270bbeb8229a4470d606c96876ba5c87335bf
This commit adds support for cache invalidation in GPU VIPER protocol's
SQC cache. To support this, the commit also adds L1 cache invalidation
framework in the Sequencer such that the Sequencer sends out an
invalidation request for each line in the cache and declares completion
once all lines are evicted.
Change-Id: I2f52eacabb2412b16f467f994e985c378230f841
This PR removes a circular dependency between `QoSMemSinkCtrl` and
`QoSMemSinkInterface` that prevented the `controller()` function of
`QoSMemSinkInterface` from being used by removing the default value for
`QoSMemSinkCtrl.interface`.
Change-Id: I4ecc39b974e239be1a2e9285e1f6f8ea873c018d
The vlm.v and vsm.v unit-stride mask load/store instructions are
constructed with an incorrect VL when the current one is larger than
than VLEN/EEW (i.e. when LMUL > 1). This commit fixes the issue for both
instructions.
We were having some difficulty on a server running this
`apt-apt-repository` command due to suspected firewall issues. On
further inspection is appear to be superfluous as git can be obtained
easily through `apt-get` without adding this repository.
This patch provides unit-stride fault-only-first loads (i.e. vle*ff) for
the RISC-V architecture.
They are implemented within the regular unit-stride load (i.e. vle*). A
snippet named `fault_code` is inserted with templating to change their
behaviour to fault-only-first.
A part from this, a new micro based on the vset\*vl\* instructions
(VlFFTrimVlMicroOp) is inserted as the last micro in the macro
constructor to trim the VL to it's corresponding length based on the
faulting index.
This trimming micro waits for the load micros to finish (via data
dependency) and has a reference to the other micros to check whether
they faulted or not. The new VL is calculated with the VL of each micro,
stopping on the first faulting one (if there's such a fault).
I've tested this with VLEN=128,256,...,16384 and all the corresponding
SEW+LMUL configurations.
Change-Id: I7b937f6bcb396725461bba4912d2667f3b22f955
This commit allows CompData_SD be sent when ReadShared hits on UD line and
the local cache keeps the line, unless the request doesn't allow SD.
Change-Id: I337f24c871cc4c19c5b5fb11f9b35c0a8eb7911c
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream respond with Unique even though it doesn't deallocate
the line. This will make the requestor to UD and the downstream to UD_RU.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example sequence is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3 doesn't
back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock.
L3 becomes RUSC and LLC becomes UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data.
Change-Id: Ic9bee27f2ec8906dd5df8bd3be60e5a9a76c782f
In Ruby CHI protocol UD_RU state means the line is in UD state in
the local cache and the upstream may have it in UD or UC state.
In the previous implementation UD_RU line was just dropped without
WriteBack which can cause loss of dirty data when the upstream has it
in UC state.
This commit fixes it by performing WriteBack when evciting UD_RU line.
Change-Id: I1db9b4f95cc576e71dcef38b01de24775df514ba
Vector unit-stride instructions have an EEW encoded directly in the instruction,
We should use that instead of SEW in vtype.
Change-Id: I282041ce8ed57fbcca899f7497ef6c6fb2dfcf85
Besides the standard RISC-V interrupts software, timer, and external
interrupt, the RISC-V specification also offers the possibility to
implement local interrupts. With this patch, we contribute an extension
of RiscvInterrupts that enables connecting interrupt sources to the
local interrupt controller. We assigned the local interrupts to
machine-level and gave them the highest priority. If two local
interrupts are pending, there exception code will be the tie-breaker
(higher ID > lower ID). 32 Bit systems only recognize the local
interrupts 16 to 31, 64 Bit systems 16 to 63.
Change-Id: Iff8d34e740b925dce351c0c6f54f4bd37a647e0c
---------
Co-authored-by: Robert Hauser <robert.hauser@uni-rostock.de>
This allows us to manually trigger daily test runs rather than wait for
the scheduled time. This can be useful in cases where a fix for a broken
test is pushed and we wish to verify it works as intended ASAP.
This change allows pyunit tests to be run on specific directories
instead of the default `pyunit` directory.
You can pass in the directory as follows. I have built gem5.opt for
RISCV however it should work the same with other builds
```
./build/RISCV/gem5.opt tests/run_pyunit.py --directory tests/pyunit/gem5/
```
The default path works as it is currently
```
./build/RISCV/gem5.opt tests/run_pyunit.py
```
Change-Id: Id9cc17498fa01b489de0bc96a9c80fc6b639a43f
Signed-off-by: Suraj Shirvankar <surajshirvankar@gmail.com>
The RISC-V privilege spec don't specify the implementation of
PMA(physical memory attribute), which is addressed in the previous
CL[1].
This CL creates the BasePMAChecker to support customized PMA so that we
can only focus on the features wanted in the study. The CL also leaves
the common methods `check` and `takeOverFrom` to make MMU easy to
interact with PMA.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/40596
Change-Id: I9725e3a8f7f9276e41f0d06988259456149d2a77
Crypto instructions will cause an undefined instruction when executed
with SIMD disabled. The PR is also
refactoring their implementation by checking the release object instead
of the ID register field. This is improving
readability
The backdoor request in b_transport is only used for hinting the dmi
capability. Since most of traffic patterns are continous, we can cache
the previous backdoor request result to spare the backdoor inspect of
next request.
Change-Id: I53c47226f949dd0be19d52cad0650fcfd62eebbc
This commit adjusts the logic in VectorFloatMaskMacroConstructor to
ensure the %(copy_old_vd)s section is not skipped when vl = 0, ensuring
correct values in destination vector register.
Change-Id: I2478722d6f003a0f2e4b3cd0ba3e845bed938ee6
This is the same problem as #715 .
We not only check for the presence of the relative FEAT_*,
we also check if AdvSIMD is enabled; we throw an undefined
instruction otherwise.
Change-Id: I1fd0cdc8057c5a7901774802dc076817f06c8e66
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Check directly if extension is enabled instead of looking
for ID register field value. This makes the code more readable
Change-Id: If0b882ac3464c3587731b72a7edb3b8b65ea86c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Having the number of physical registers matching exactly the number of
architectural ones does not guarantee a proper execution as it means the
freeList would have 0 registers available for renaming. In this case the
worst would happen: renaming would silently stall execution
indefinitely. With this change we report the issue to the user and fail
execution
Change-Id: I1eb968802f1a1a5115012f44b541542a682f887d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The method is extracting the minimum number of [1] non-zero free
registers/entries across all register classes. This means that if we
have saturated all register storage for a particular class, renaming
will stop as a whole.
I believe it does make sense to keep renaming and only block renaming in
case an instruction requiring the particular register type is
encountered. This would happen with the Rename::renameInsts method
[1]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename_map.hh#L269
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L662
Change-Id: I932826a77a5c0b2e05d8fdcab0e6ca13cf0e3d23
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is working around an existing SMT issue [1].
The BaseO3CPU uses two physical matrix registers [2]. This is
enough for a single threaded CPU which as of now uses
1 architectural matrix only.
The problem arises when SMT is enabled. As 2 architectural matrices
need to be supported by a single CPU, the O3CPU won't have any available
register in the freeList for renaming. This causes the SMT O3CPU to
indefinitely stall renaming [3]
If the archtectural number of registers is seto to 0, the regclass won't
be taken into consideration when evaluating if we can rename
instructions.
This issue has been implicitly fixed for RISCV by a preceding PR [4]
[1]: https://github.com/gem5/gem5/issues/668
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/BaseO3CPU.py#L170
[3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L1228
[4]: https://github.com/gem5/gem5/pull/83
Change-Id: I99bfdefff11a246b1f191251dc67689e95b3f0db
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is more complaint with the VMSAv8-64, which is using Translation
Regimes instead of
historical (Armv7) isHyp tagging and the ExceptionLevel managing the
translation. This greatly
simplifies translation code, specially with FEAT_VHE where the managing
el (EL2) could handle to different
translation regimes (EL and EL2&0).
The PCI configuration space is 256 bytes, yet because the
PCI_CONFIG_SIZE macro is 0xff, the final register allocation in the IDE
controller only allocated up to byte 255.
Change-Id: I1aef2cad9df366ee8425edb410037061eb29ae33
This change improves the functionality of strided generator to create
trace with better flexibility.
It allows the user to manually set offset and stride size instead of
calculating it based on a "gen_id".
This way different patterns could be created with the same SimObject.
In addition, this change adds stdlib components for strided generator.
We therefore rename it to exceptionLevel
Change-Id: I2a3aabaefa315d95bd034b13d95d5a5b0b8e9319
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
With the old code, the MAIR_EL1 register was checked when inserting
an EL2&0 TLB entry
Change-Id: I064032fb2946777c2f4c50c06a124f828245e18a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The problem with:
ELIs64(tc, aarch64EL == EL0 ? EL1 : aarch64EL);
Is that when we are executing at EL0 in host (EL2&0 translation
regime), the execution mode (AArch32 vs AArch64) is dictated
by EL2 and not by EL1 (which is the guest)
Change-Id: I463a2a9461c94d0886990ae3d0a6e22aeb4b9ea3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is the final step in the transformation process.
We limit the use of the "managing Exception Level" for
a translation in favour of the more standard "Translation
Regime"
This greatly simplifies our code, especially with VHE
where the managing el (EL2) could handle to different
translation regimes (EL and EL2&0).
We can therefore remove the isHost flag wherever it got
used. That case is automatically handled by the proper
regime value (EL2&0)
Change-Id: Iafd1d2ce4757cfa6598656759694e5e7b05267ad
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The Xt is not part of the architectural name of the register
and it was likely added with the introduction of extended
register (Xt) TLBIs in Armv8 to differentiate them with
the old Armv7 ones.
The use of _Xt was not consistent anyway: newer TLBIs were
already omitting it.
Change-Id: Ic805340ffa7b5770e3b75a71bfb76e055e651f8b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We should stop using isHyp.. An hypervisor entry is flagged
already by the EL of the entry (el == EL2)
Change-Id: I20c3d06fa2b04e0b938a380ca917d0b596eddcf2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The isHyp descriptor is an old artifact of armv7 and it flags a PL2
(AArch32) or EL2 & EL2&0 (AArch64) translations.
It is commonly set according to the EL/mode [1] but it may differ from
the execution state in case of explicit translation requests (via
the AT instruction as an example [2]).
There is really no need to complicate the masking of isHyp. We should
just make use of the tranType method (in charge of setting aarch64EL)
to properly set aarch64EL, and make isHyp coincide with the case of
aarch64EL == EL2.
This is a step towards the removal of the isHyp flag.
More specifically the patch does the following:
* HypMode translation type moved in the EL2 case
The translation is used by
ATS1HR/ATS1HW:
Performs stage 1 address translation as defined for PL2 and the
Non-secure state
* S1S2NsTran translation type moved in the EL1 case
The translation is used by
ATS12NSOPR/ATS12NSOPW:
Performs stage 1 and 2 address translations as defined for PL1 and the
Non-secure state
* S1CTran translation type can be at either EL1 or EL3
The translation is used by
ATS1CPR/ATS1CPW
Performs stage 1 address translation as defined for PL1 and the current
Security state
[1]: https://github.com/gem5/gem5/blob/stable/src/arch/arm/mmu.cc#L1281
[2]: https://github.com/gem5/gem5/blob/stable/src/arch/arm/mmu.cc#L1282
Change-Id: Ie653170f6053c5d8141a2de9f50febf5bf53ab9c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This removes the gfx803 and gfx801 targets for building applications as
GCN3 will be removed from gem5. It also removes the copy/paste bug from
the HACC docker which is clobbering the HCC targets and removing gfx902.
Change-Id: I9a0d7fda437e797baf0f743a0a450948b9260b07
Co-authored-by: Harshil Patel <hpppatel@ucdavis.edu>
This change adds support to use KVM cores on the ARM board. The board
simulates gic to enable KVM, similar to the gem5 ARM FS configs. The
limitation is that it only supports VExpress_GEM5_V1.
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
This pull request has two commits, one is to fix the segmentation fault,
> arch-riscv: Fix segmentation fault in vmsbf/vmsof/vmsif
This commit simplifies the conditional logic in vmsbf/vmsof/vmsif
by removing an unnecessary variable and condition.
The updated logic checks 'this->vm' or the result of 'elem_mask(v0, i)'
directly, which prevents a segmentation fault regardless of
whether 'vm' is set or not.
another is to fix the incorrect output,
> arch-riscv: Add template Vector1Vs1VdMaskDeclare
This commit adds a new template, Vector1Vs1VdMaskDeclare, to replace
the use of Vector1Vs1RdMaskDeclare in Vector1Vs1VdMaskFormat.
The change addresses the issue with the number of indices in
srcRegIdxArr.
Only two indices are available in Vector1Vs1RdMaskDeclare, but
instructions
that use Vector1Vs1VdMaskFormat, like 'vmsbf', require three indices
(for vs1, vs2(old_vd), and vm) to function correctly.
Demonstration of incorrect output compared with spike:
[vmsbf](https://github.com/QQeg/rvv_intrinsic_testcases/tree/master/vmsbf)
```
**** REAL SIMULATION ****
src/sim/simulate.cc:199: info: Entering event queue @ 0. Starting simulation...
Vs1 = 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Vd = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Exiting @ tick 23504000 because exiting with last active thread context
----SPIKE----
bbl loader
Vs1 = 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Vd = 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
```
We should clean the instruction buffer after the fence.i is execute
to avoid execute old instruction for self-modifying code
Change-Id: Iece0ee0d10631fcd9bd17ee67cf0c92f72acdbd8
This commit adds a new template, Vector1Vs1VdMaskDeclare, to replace
the use of Vector1Vs1RdMaskDeclare in Vector1Vs1VdMaskFormat.
The change addresses the issue with the number of indices in srcRegIdxArr.
Only two indices are available in Vector1Vs1RdMaskDeclare, but instructions
that use Vector1Vs1VdMaskFormat, like 'vmsbf', require three indices
(for vs1, vs2(old_vd), and vm) to function correctly.
Change-Id: I0c966e11289ce07efcc3b0cc56948311289530ad
This commit simplifies the conditional logic in vmsbf/vmsof/vmsif
by removing an unnecessary variable and condition.
The updated logic checks 'this->vm' or the result of 'elem_mask(v0, i)'
directly, which prevents a segmentation fault regardless of
whether 'vm' is set or not.
Change-Id: I799fa7b684ff98959a64f9694ef9c854f3a1f43a
These are:
FEAT_AES,
FEAT_PMULL,
FEAT_SHA256,
FEAT_SHA1,
FEAT_CRC32
With this patch we are also enabling them by default by adding them to
the Armv8 release object. Some of them are mandatory anyway since
Armv8.1
Change-Id: I221ae8646d91151fdfaf97a4815168a4fe3d8c5a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
If the NOP count of an SDMA NOP packet goes beyond the wptr address, the
queue decode method will loop infinitely. If a packet comes in with a
bad count this causes gem5 to hang. This change advances the rptr one
dword at a time until either reaching the NOP count or when rptr == wptr
to prevent this issue.
Change-Id: Ib2c0f74a477bff27890c9c064bb4190e76e513bd
The .mailmap file is designed to maintain a record of unique
contributors, aiming for a single identifier for each person. What is
included in this file does not impact or alter commits; rather, it just
merges the counts for all commits by one person under a single name.
Related to issue #703 , this PR removes GCN3 related files and updates
source code, documentation, and tests to switch over to Vega is that was
not done already. Highlights are:
- Remove all src/arch/amdgpu/gcn3 files and update Kconfigs.
- Remove references to GCN3 and replace with Vega where applicable.
- Update the build targets in the gcn-gpu Docker. This will need to be
rebuilt but not urgently.
- Remove the GCN3 tag in testlib. Most tests seem to be using Vega
already, so that commit is small.
Vega (gfx900) introduced new memory aperture registers to get the base
address and limit for LDS and private (scratch) memory. These have not
commonly been used by the compiler until ROCm 6. Now that the compiler
is generating reads from these special registers, implement the support
for them.
Tested with LULESH which is using the SHARED_BASE register (LDS) with
ROCm 6.0. This assembly seems to replace S_GETREG_B32 emitted by the
ROCm 5 compiler.
Change-Id: Id2bd26ce8ef687c84a647fa2ac2da54d657913e5
By default all SDMA queues are privileged queues, meaning the addresses
in SDMA packets use the privileged translation tables. RLC queues
(sometimes called user queues) are not necessarily privileged and might
use user translation tables. RLC queues are used more often in ROCm 6.0
exposing an issue with invalid translations with RLC queues.
This changeset checks the priv bit in the SDMA MQD when an RLC queue is
mapped. Each packet type which uses an address then checks the bit
before performing translation. Tested with daily/weekly tests with a
ROCm 6.0 disk image and tests are passing.
Change-Id: I6122fbc194e8d6f5d38e81f1b0e11646d90e0ea0
This PR reorganizes the instructions.cc into multiple files and renames
some files which do not match their corresponding header file names. The
intention is to make iterating on development of these files faster.
An issue raised in #240 where if an address range ends
at the last byte of a 64 bit address space, it will be
considered a subset of any other address range that starts
at the first byte of the range.
Change-Id: I517f4717052eda2504de971be0eb59ee9a623dd3
This separation was only for convenience while GPU tests were under
development and rapidly changing. This test merges the GPU tests into
the weekly tests where they belong.
The files registers.cc, isa.cc, and decoder.cc do not match the header
name. This is a minor cleanup to make development more straightforward.
Change-Id: Ibab18dfe315b0ce84359939b490f8227ea43cac0
The Vega instructions.cc file is 47k lines long which results in both
large compilation times whenever it is modified and long style check
times. This makes iterating over more complex instruction
implementations very time consuming.
This commit moves the instruction definitions to multiple files based on
the instruction encoding (SOP2, VOP2, FLAT, DS, etc.). The resulting
files are much smaller (max is 8k lines) and compilation and style check
times are much more reasonable. Other than moving code around, there are
no functional changes in this commit.
Change-Id: Id4ac8e98ef11a58de5fd328f8a0cd7ce60a11819
The addition of std::optional in #732 caused a compile error. This
change fixes the error by checking to see if the value is present and
panicing otherwise.
Change-Id: I46c3fb76eb0e14ba7bede7c336293fbe9add8c84
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
1. Add the new double width for int64_t and uint64_t
2. Use the wider type to get the upper result of multiplication
Change-Id: Id6cfa6f274c65592b2b3e2b70c00f82954b41f1a
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.
Using misc since there is not enough room to fit all the tags.
Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
I’ve been working on a fix for the issue #759 where ‘vd’ incorrectly
stores all zeros when ‘vl’ is set to 0 in VectorIntMaskMacroConstructor.
My solution seems to work, but it behaves differently from other macros
when ‘vl’ = 0. Instead of pushing a ‘nop’ to ‘microops’, it pushes a
micro operation that remains ineffective due to ‘vl’ being 0.
When restoring the simulate_limit_event pointer is not
restored after running the dry simulation run which ends up in
"Panic: event not found!"
In this commit we fix this issue by correctly restoring
the pointer value along with the event queue head
Change-Id: Id5ad4d2a270a6cd34eec1dc5c9b170b2b84610d4
---------
Co-authored-by: narya <nitish.arya@bsc.es>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Newer compilers error on -Warray-length in the recent MI200 patches due
to casting from a 32-bit data type to a 64-bit type. Change it to cast
the 32-bit integer first then 64-bit integer latter to remove the
warning.
Rerun of validation tests on the three instructions passed.
Change-Id: I0309e5f7b5b8cc8ce1651660ddddb120fa6e7666
Currently, the TLB enforces that the bit 63 of a physical address to be
zero. This check stems from the riscv-tests that checks for the bit 63
of a physical address [1]. This is due to the fact that the ISA
implicitly says that the physical address must be zero-extended on the
most significant bits that are not translated [2]. More details on this
issue is here [3].
The check for bit 63 of a physical address in the TLB is rather too
specific, and I believe the check of invalid physical address is alread
implemented in PMA. Thus, this change proposes to remove this check from
RISC-V TLB.
[1]
bd0a19c136/isa/rv64mi/access.S (L18)
[2] https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/8kO7X0y4ubo
[3] https://github.com/gem5/gem5/issues/238
Change-Id: I247e4d4c75c1ef49a16882c431095f6e83f30383
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This is matching what we are already doing in the starter_fs.py script
Change-Id: I50239050be9bd151a607ec892f8dd9322b24040b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The PMAChecker and PMP are only used in the RisvISA and it should be in
the RiscvISA to simply the implementation
Change-Id: I4968e2de4c028cb2dceed977f2173fc8b1efd175
The RFC is defaulted to a size of 0 which removes it completely. To use
the RFC set the --register-file-cache-size to a non-zero multiple of
two. In addition, rfc_pipe_length may be altered to increase or decrease
RFC latency benefit.
The RFC is defaulted to a size of 0 which removes it completely. To use
the RFC set the --register-file-cache-size to a non-zero multiple of
two. In addition, rfc_pipe_length may be altrered to increase or
decrease RFC latency benefit.
Change-Id: I6f5bf5b750eb64155fbc8c8343e9feadce5c9f79
This patch is amending encodeAArch64SysReg so that it covers the case
where there are no arch numbers available for the misc index passed as
an argument.
This could happen if the register ID is a gem5 pseudo register which is
not associated with any architected op1/op2/crn/crm tuple.
Rather than panicking we return a nullopt.
Change-Id: I7ab70467105ef93c0c78ac4e999c7dc8e5e09925
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Implemented according to the ISA spec. Validated with silion. In
particular the sign extend is important for the signed variants and the
unsigned variants seem to overflow lanes (hence why there is no mask()
in the unsigned varints. FP16 -> FP32 continues using ARM's fplib.
Tested vs. an MI210. Clamp has not been verified.
Change-Id: Ifc09aecbc1ef2c92a5524a43ca529983018a6d59
Starting with MI200, packed math can operate on double dword inputs. In
this case, 64-bits of inputs (two VGPRs per lane) contain two FP32
values.
Add instructions to perform add, multiply, and FMA on packed FP32 types.
Change-Id: Ib838bff91a10e02e013cc7c33ec3d91ff08647b0
This change adds all of the missing flat/global atomics up to including
the new atomics in gfx90a (MI200). Adds all decodings and instruction
implementations with the exception of __half2 which does not have a
corresponding data type in gem5. This refactors the execute() and
completeAcc() methods by creating helper functions similar to what
initiateAcc() uses. This reduces redundant code for global atomic
instruction implementations.
Validated all except PK_ADD_F16, ADD_F32, and ADD_F64 which will be done
shortly. Verified the source/dest register sizes in the header are
correct and the template parameters for the new execute()/completeAcc()
methods are correct.
Change-Id: I4b3351229af401a1a4cbfb97166801aac67b74e4
The AMDKernelCode struct is very outdated. Most of the fields are no
longer used and have been replaced with new fields that are used.
Therefore in order to support the new fields the code object needs to be
updated. The new structure is based on the table located at
https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-kernel-descriptor
Most notably this adds the new compute_pgm_rsrc3 and kernarg preload
fields which are new features in gfx90a (MI200). The accum_offset in
compute_pgm_rsrc3 and kergarg preload values are necessary to run
application which enable those features and therefore a way to check
their values is needed.
Also noteable is the removal of enable_sgpr_workgroup_id_{X,Y,Z}. These
seem to be unused in all versions of ROCm that gem5 supports and
therefore these fields can be removed. They are replaced with a reserved
field in the new code object.
Change-Id: I5542442e1e5961b05e17affad0adb5186d6d9d1a
Use the opSelectorToRegSym which will print the full range of VGPRs
(e.g., will now print v[2:3] instead of v2 when the source / dest is
64-bits). Fixes atomic disassembly prints. Now shows "glc" if GLC bit is
enabled. Fixes some VGPR fields being printed as an SGPR in places where
the 9-bit register index bit is implied (e.g., VDST).
This makes it easier to use a GPUExec trace to match with LLVM
disassembly when debugging.
Change-Id: Ia163774850f0054243907aca8fc8d0361e37fdd5
This adds the VOP3P and VOP3P_MAI encodings from the MI200 spec. These
instructions are used for packed math and miSIMD instructions. The first
19 VOP3P opcodes are implemented and validated against hardware. This
includes all instructions which operate on one dword containing two
packed 16-bit values of fp16, int16_t, or uint16_t.
Implement one MFMA instruction for now which was also validated against
hardware.
This is useful in other ISAs to implement FP16 computation. For example,
it can be used in the GPU model. The ARM specific misc register is
ignored in that case.
Change-Id: I339ac0ccd9be4371b0f220ad99068e5e12b3d263
This initialization method is used in gfx90a (MI200). Rather than using
three VGPRs for X,Y,Z dimensions of the kernel, pack them into one
register with 10-bits for each dimensions.
Change-Id: I8e5b681c8287779ff9f80451d6028e862322294a
The version is necessary for determining the correct ABI init process.
Add it to the task queue so it is accessible when doing ABI init.
Change-Id: If77434b0f93614057b5c40fcf612d59b54e05dbb
this adds an option --with-libcxx, that adds the -stdlib=libc++ flag to
link against libc++ instead of libstdc++ on Linux. Currently this is
only possible with clang and may not work with all build configurations
(e.g. protobuf linked against libstdc++), so this needs to be opt-in
rather than being on by default for clang whenever libc++ is detected.
Change-Id: Ib4022a58bb2dbd32417c58f01c7443a02ff710fe
The args.cpu_type is not a type but a string so the isinstance checking
will always fail and an assertion will always be thrown
A cherry-pick of #684 to develop
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Co-authored-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
After removing `get_runtime_isa`, the `send_evicts` function in the ruby
configs assumes that there is an ISA built. This change short-circuits
that logic if the current build is the NULL (none) ISA.
Change-Id: I75fefe3d70649b636b983c4d2145c63a9e1342f7
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
These tests previously used "build/NULL" but due to changes in the
"Ruby" and "garnet_synth_traffic.py" scripts, "NULL" fails as the script
exists "X86TimingSimple" with MESI_Two_Level.
This change fixes the tests by compiling and using the correct
compilation of gem5. It shouldn't affect the tests in any negative way.
As far as I'm aware it does not matter what ISA is used for these tests.
Change-Id: I8ae84b49f65968e97bef4904268de5a455f06f5c
configs/ruby/Ruby.py fails when `DerivO3CPU` is not compiled into the
gem5 binary. The `isinstance` check fails. This fix addds a guard.
Change-Id: I1e5503ab18ec94683056c6eb28cebeda6632ae8e
- The bytesRead and bytesWritten stat had duplicate names. Updated
bytesRead and bytesWritten for dram_interface and nvm_interface
Change-Id: I7658e8a0d12ef6b95819bcafa52a85424f01ac76
After removing `get_runtime_isa`, the `send_evicts` function in the ruby
configs assumes that there is an ISA built. This change short-circuits
that logic if the current build is the NULL (none) ISA.
- The bytesRead and bytesWritten stat had duplicate names. Updated
bytesRead and bytesWritten for dram_interface and nvm_interface
Change-Id: I7658e8a0d12ef6b95819bcafa52a85424f01ac76
After removing `get_runtime_isa`, the `send_evicts` function in the ruby
configs assumes that there is an ISA built. This change short-circuits
that logic if the current build is the NULL (none) ISA.
Change-Id: I75fefe3d70649b636b983c4d2145c63a9e1342f7
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
The WriteUniqueZero is an immediate write to a Snoopable address region
that does not require any data transfer (cacheline is zeroed)
Change-Id: Ia8c9b40e08a3b7d613f0b62ce0ac4b0547860871
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
- This script copies all resources from a mongodb database locally The
script creates a resources.json and downloads all the resources. It also
updates the resources.json to point to these local downloads instead of
the cloud bucket.
Change-Id: I15480c4ba82bbf245425205c9c1ab7c0f3501cc3
Note: A bug was identified in that the one of the special file paths,
namely /proc/meminfo contained an extra trailing /, implicitly making
the incorrect assumption that meminfo was a directory, when it is, in
fact, a (pseudo-)file. This was causing application in SE mode to fail
opening the meminfo pseudo-file with errno 13. This commit fixes this
issue.
Change-Id: I93fa81cab49645d70775088f1e634f067b300698
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
These tests previously used "build/NULL" but due to changes in the
"Ruby" and "garnet_synth_traffic.py" scripts, "NULL" fails as the script
exists "X86TimingSimple" with MESI_Two_Level.
This change fixes the tests by compiling and using the correct
compilation of gem5. It shouldn't affect the tests in any negative way.
As far as I'm aware it does not matter what ISA is used for these tests.
Change-Id: I8ae84b49f65968e97bef4904268de5a455f06f5c
configs/ruby/Ruby.py fails when `DerivO3CPU` is not compiled into the
gem5 binary. The `isinstance` check fails. This fix addds a guard.
Change-Id: I1e5503ab18ec94683056c6eb28cebeda6632ae8e
- This script copies all resources from a mongodb database locally The
script creates a resources.json and downloads all the resources. It also
updates the resources.json to point to these local downloads instead of
the cloud bucket.
Change-Id: I15480c4ba82bbf245425205c9c1ab7c0f3501cc3
The args.cpu_type is not a type but a string so the isinstance checking
will always fail and an assertion will always be thrown
Change-Id: I6a88d1a514bb323c517949632f4e76be40e87e8c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Note: A bug was identified in that the one of the special file paths,
namely /proc/meminfo contained an extra trailing /, implicitly making
the incorrect assumption that meminfo was a directory, when it is, in
fact, a (pseudo-)file. This was causing application in SE mode to fail
opening the meminfo pseudo-file with errno 13. This commit fixes this
issue.
Change-Id: I93fa81cab49645d70775088f1e634f067b300698
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Stash requests will simply be discarded by the Home Node This will
return a CompI response to the RNF
Change-Id: I9c2ce5d4d42f380d1a554933d381cf8a8590ba22
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Correct the instruction flags of RISC-V vector store instructions, such
as `vse64_v`, `vse32_v`. The `vse64_v` in `decoder.isa` is
`Mem_vc.as<uint64_t>()[i] = Vs3_ud[i];` and it will generate the code
`Mem.as<uint64_t>()[i] = Vs3[i];`. The current regex of assignRE only
mark the operand `Mem` as `dest` only if meet the formats like `Mem = Rd`
or `Mem[i] = Rd` because the code ` = Rd` or `[i] = Rd` match the
`assignRE` respectively. For the expression `Mem.as<uint64_t>()[i]`,
the operand `Mem` will falsely mark the operand as `src` because the
code `.as<uint64_t>()[i]` is not match the `assignRE`.
The PR will ensure the operand `Mem` is dest for the format like
`Mem.as<xxx>()[i] = yyy`.
Change-Id: I9c57986a64f1efb81eb9c7ade90712b118e0788d
Due to a change introduced in https://github.com/gem5/gem5/pull/625, a
gem5 resource will not download any external files until
`get_local_path()` is called. In the construction of the Looppoint
Resources this function was not called, the `local_path` variable was
called directly. As such, an error occured.
The downside of this fix is the Looppoint resources external files are
downloaded when `obtain_resource` is called, thus the bandwidth savings
introduced with https://github.com/gem5/gem5/pull/625 will not occur for
Looppoint resources. However, https://github.com/gem5/gem5/issues/644
proposes a fix which would supercede the
https://github.com/gem5/gem5/pull/625 solution.
Change-Id: I52181382a03e492ec1cb58b01e71bc4820af9ccc
The value of a `WorkloadResource`'s additional parameter may not always
be a string. It can be any JSON value (integer, a list, a dict, ect.).
For Looppoint resources we have additional parameters such as a List of
region start points.
The assert inside workloads checking the type of the value breaks
certain usecase and is therefore removed in this commit.
Change-Id: Iecb1518082c28ab3872a8de888c76f0800261640
There were two small bugs preventing gem5 from obtaining Looppoint
resources.
1. When obtained via a `WorkloadResource` there was an assert which
assumed the values in the resource's DB entry's `additional_parameter`
field were of type string. This is not the case. For Looppoint resources
there are additional parameters which are arrays.
2. Due to changes introduced in https://github.com/gem5/gem5/pull/625,
the Looppoint CSV and JSON files were not being downloaded when needed.
This was fixed by replacing access to the `local_path` variable with a
call to `get_local_path()`.
Because each vector load is fragmented into 64 byte cache-aligned
chunks, and one page-table walk is issued per fragment on tlb miss,
walks start to accumulate on a pending queue, which is processed in a
blocking way (no pending walks can be issued while one is being
processed). This adds noticeable latency on vector loads when VLEN is
sufficiently large.
This commit fixes the issue by allowing walks to be squashed if a TLB
lookup hits just before starting the walk on `startWalkWrapper`. This
idea was taken from the ARM walker.
Due to a change introduced in https://github.com/gem5/gem5/pull/625, a
gem5 resource will not download any external files until
`get_local_path()` is called. In the construction of the Looppoint
Resources this function was not called, the `local_path` variable was
called directly. As such, an error occured.
The downside of this fix is the Looppoint resources external files are
downloaded when `obtain_resource` is called, thus the bandwidth savings
introduced with https://github.com/gem5/gem5/pull/625 will not occur for
Looppoint resources. However, https://github.com/gem5/gem5/issues/644
proposes a fix which would supercede the
https://github.com/gem5/gem5/pull/625 solution.
Change-Id: I52181382a03e492ec1cb58b01e71bc4820af9ccc
The value of a `WorkloadResource`'s additional parameter may not always
be a string. It can be any JSON value (integer, a list, a dict, ect.).
For Looppoint resources we have additional parameters such as a List of
region start points.
The assert inside workloads checking the type of the value breaks
certain usecase and is therefore removed in this commit.
Change-Id: Iecb1518082c28ab3872a8de888c76f0800261640
The stats initialization in the AbstractMemory allocates the space
according to the max requestors of the System. This may cause issues in
multiple system simulation.
Given there are two system A and B. A has one requestor and a memory,
while B has two requestors. When the requestor with requestor id 2
sending requests to the meomry in A, the simulator would crash because
requestor id 2 is out of the allocated space.
Current solution is adding a SysBridge between across A and B which
would rewrite the requestor id to a valid one. This solution works but
it needs to the bridge at the correct boundary which may not easy. In
addition, the stats would record a mapped data which may not accurate.
To reduce the complexity, we add an flag to AbstractMemory to control
the stats. If users don't want the statistics and want to solve the
cross system issue simply, users can disable the statistics collection.
We also makes the flag by default True to not disturb current users.
Correct the instruction flags of RISC-V vector store instructions, such
as `vse64_v`, `vse32_v`. The `vse64_v` in `decoder.isa` is
`Mem_vc.as<uint64_t>()[i] = Vs3_ud[i];` and it will generate the code
`Mem.as<uint64_t>()[i] = Vs3[i];`. The current regex of assignRE only
mark the operand `Mem` as `dest` only if meet the formats like `Mem =
Rd` or `Mem[i] = Rd` because the code ` = Rd` or `[i] = Rd` match the
`assignRE` respectively. For the expression `Mem.as<uint64_t>()[i]`, the
operand `Mem` will falsely mark the operand as `src` because the code
`.as<uint64_t>()[i]` is not match the `assignRE`.
The PR will ensure the operand `Mem` is dest for the format like
`Mem.as<xxx>()[i] = yyy`.
Correct the instruction flags of RISC-V vector store instructions, such
as `vse64_v`, `vse32_v`. The `vse64_v` in `decoder.isa` is
`Mem_vc.as<uint64_t>()[i] = Vs3_ud[i];` and it will generate the code
`Mem.as<uint64_t>()[i] = Vs3[i];`. The current regex of assignRE only
mark the operand `Mem` as `dest` only if meet the formats like `Mem = Rd`
or `Mem[i] = Rd` because the code ` = Rd` or `[i] = Rd` match the
`assignRE` respectively. For the expression `Mem.as<uint64_t>()[i]`,
the operand `Mem` will falsely mark the operand as `src` because the
code `.as<uint64_t>()[i]` is not match the `assignRE`.
The PR will ensure the operand `Mem` is dest for the format like
`Mem.as<xxx>()[i] = yyy`.
Change-Id: I9c57986a64f1efb81eb9c7ade90712b118e0788d
Inspired by the similar feature in ARM's full system workload, this
change adds
an option to halt gem5 simulation if the guest system encounter kernel
panic
or kernel oops.
On RiscvISA::BootloaderKernelWorkload, by default, the simulation
will exit upon kernel panic, while kernel oops will not induce
simulation halt.
This is because the system will essentially do nop after a kernel panic,
while the
system might be still functional after a kernel oops.
Dumping kernel's dmesg is useful for diagonizing the cause of kernel
panic, so
ideally, we want to dump the guest's dmesg to the host. However, due to
a bug
described in [1], kernel v5.18+ dmesg might not be dumped properly.
Hence, the
dmesg will not be dumped to the host.
On RiscvISA::FsLinux, this feature is turned off by default as the
symbols from the
official RISC-V kernel resource are stripped from the binary. However,
if this feature
is enable, the dmesg will be dumped to the host system.
[1] https://github.com/gem5/gem5/issues/550
Change-Id: I8f52257727a3a789ebf99fdd4dffe5b3d89f1ebf
1. Fix the wrong ISA detect of get_isa function
2. Fix the typo ObjectLIst.cpu_list
3. Fix missing PageTableWalkerCache
4. Fix the invalid default cpu_type paramter
Change-Id: I217ea8da8a6d8e712743a5b32c4c0669216ce6c4
The stats initialization in the AbstractMemory allocates the space
according to the max requestors of the System. This may cause issues in
multiple system simulation.
Given there are two system A and B. A has one requestor and a memory,
while B has two requestors. When the requestor with requestor id 2
sending requests to the meomry in A, the simulator would crash because
requestor id 2 is out of the allocated space.
Current solution is adding a SysBridge between across A and B which
would rewrite the requestor id to a valid one. This solution works but
it needs to the bridge at the correct boundary which may not easy. In
addition, the stats would record a mapped data which may not accurate.
To reduce the complexity, we add an flag to AbstractMemory to control
the stats. If users don't want the statistics and want to solve the
cross system issue simply, users can disable the statistics collection.
We also makes the flag by default True to not disturb current users.
Change-Id: Ibb46a63d216d4f310b3e920815a295073496ea6e
This includes mla/s index version implementation using the existing template code
to avoid code repeatition.
Change-Id: If1de84e01dec638e206c979ca832308ebc904212
RISCV full system workloads have the capability of exit the simulation loop
upon the guest's kernel panic/oops. This change adds more stdlib exit event types
to accommodate the corresponding gem5 exits upon the guest's kernel panic and
kernel oops.
Change-Id: I3a4f313711793a473c6f138ff831b948034d0bb6
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Inspired by a similar feature in ARM's full system workload, this change adds
an option to halt gem5 simulation if the guest system encounter kernel panic
or kernel oops.
On RiscvISA::BootloaderKernelWorkload, by default, the simulation
will exit upon kernel panic, while kernel oops will not induce simulation halt.
This is because the system will essentially do nop after a kernel panic, while the
system might be still functional after a kernel oops.
Dumping kernel's dmesg is useful for diagonizing the cause of kernel panic, so
ideally, we want to dump the guest's dmesg to the host. However, due to a bug
described in [1], kernel v5.18+ dmesg might not be dumped properly. Hence, the
dmesg will not be dumped to the host.
On RiscvISA::FsLinux, this feature is turned off by default as the symbols from the
official RISC-V kernel resource are stripped from the binary. However, if this feature
is enable, the dmesg will be dumped to the host system.
[1] https://github.com/gem5/gem5/issues/550
Change-Id: I8f52257727a3a789ebf99fdd4dffe5b3d89f1ebf
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
`get_runtime_isa()` has been deprecated for some time. It is a leftover
piece of code from when gem5 was compiled to a single ISA and that ISA
used to configure the simulated system to use that ISA. Since multi-ISA
compilations are possible, `get_runtime_isa()` should not be used.
Unless the gem5 binary is compiled to a single ISA, a failure will
occur.
The new proceedure for specify which ISA to use is by the setting of the
correct `BaseCPU` implementation. E.g., `X86SimpleTimingCPU` of
`ArmO3CPU`.
This patch removes the remaining `get_runtime_isa()` instances and
removes the function itself. The `SimpleCore` class has been updated to
allow for it's CPU factory to return a class, needed by scripts in
"configs/common".
The deprecated functionality in the standard library, which allowed for
the specifying of an ISA when setting up a processor and/or core has
also been removed. Setting an ISA is now manditory.
Fixes#216.
The [PR](https://github.com/gem5/gem5/pull/390) adds support for new
bootloader and linux kernel. However after applying the changes the
arguments are not passed correctly to the kernel resulting in kernel
panic during simulations. This commit fixes the issue.
This seperation was only for convenience while GPU tests were under
development and rapidly changing. This test merges the GPU tests into
the weekly tests where they belong.
Change-Id: I0e7118e863dba51334de89b3bbc3592374ef63ec
clang correctly found that the functions `inCache`, `hasBeenPrefetched`
and `inMissQueue` had the wrong signatures in the DVM funcs files. These
functions are unused, so this change just updates their signatures.
Change-Id: Id669ff661e1c6c46eaf04ea1f17cd9866a9e49ed
- updated the x86-npb-benchmarks.py to use npb workloads and suites.
The suites and workloads are not in the database are also waiting
feedback. I am attaching the JSON file here.
[npb_workloads_suite.json](https://github.com/gem5/gem5/files/13431116/npb_workloads_suite.json)
To run the x86-npb-benchmarks.py script use the
GEM5_RESOURCE_JSON_APPEND env variable. The full command is:
```
GEM5_RESOURCE_JSON_APPEND=[path to npb_workloads_suite.json] ./build/X86/gem5.opt configs/example/gem5_library/x86-npb-benchmarks.py --benchmark [benchmark]
```
Change-Id: I248e6452ea4122e9260e34e4368847660edae577
Currently, if the Capstone header file is found in the host system,
scons will try to build the ArmCapstoneDisassembler regardless of the
gem5 target ISA. This is causing problem when the host has Capstone, but
the gem5 target ISA is not arm. Compiling gem5 in this case will cause
errors, e.g., ArmISA and ArmSystem is not found.
This change aims to prevent building the ArmCapstoneDisassembler when
the gem5 target ISA is not arm.
Ref:
[1] The Arm Capstone PR https://github.com/gem5/gem5/pull/494
Change-Id: I1e714d34aec8fe2a2af8cd351536951053a4d8a5
This Pull-Request addresses gem5 Issue #550. The code that dumps the
Dmesg buffer is now templated on the two variants of the `Metadata`
structure, and the correct one is chosen based on the detected Kernel
version.
To support this functionality, the pull request also adds Symbol Size
data to the loader Symbol Table, and adds a method to query the Kernel
Version from the image in guest memory. The new attributes in the Symbol
class are de-serialized speculatively, so no checkpoint upgrader is
required to support this change.
This patch reworks the Linux Kernel panic and oops events. The code has
been re-factored to provide re-usable events that can be applied to all
ISAs from the base `KernelWorkload` `SimObject`. At the moment they are
installed for the Arm workloads.
This update also provides more configuration options that can be
specified using the new `KernelPanicOopsBehaviour` enum. The options are
applied to the Kernel Workload parameters `on_panic` and `on_oops` which
are available to all subclasses of `KernelWorkload`.
The main rationale for this reworking is to add the option to cleanly
exit the simulation after dumping the Dmesg buffer. Without this option,
the simulation would continue running after a Kernel panic. If system
components (e.g. a system timer) keep the event queue alive, this causes
the simulation to run slowly to the maximum allowed tick.
The website's documentation on building gem5 now references
<gem5_base>/gem5_build in addition to <gem5_base>/build. So, we should
ignore files in that directory as well as the build directory.
Change-Id: Ie226545e04b885ce81b3c17e18b5052ed64af328
This change decouple's the downloading of a resource from it's data.
With this change the `obtain_resource` function returns the
`AbstractResource` implementation which contains the data. The resource
itself (e.g., the actual disk image, binary, file, etc.) is only
downloaded to the host system, if not already present, upon the
`get_local_path` call.
`get_local_path` is the function used by gem5 to ultimately load the
resource into a simulation, therefore this change ensures we only
download resources when they are loaded into a simulation.
This change is not ideal and comes with the following caveats:
1. The `downloader` function is created in `obtain_workload` and passed
to the `AbstractResource` implementation for later use. This function
comes with the following requirements:
* The function will download the resource to `local_path`.
* The function will not re-download the resources if already present
as this function is called _everytime_ `get_local_path` is called.
2. The directories needed to store `local_path` are created in
`obtain_workload` regardless. Ergo even if the resource is not used and
`get_local_path` is never called these directories are still created.
In keeping with this efficiency `ShadowResource` is introduced to allow
the storing of just the resource ID and Version of a resource with
additional information only obtained when requested.
This change does the following,
- Change the name of several python parameter names of the
RiscvBootloaderKernelWorkload. This is done to conform the expectation
from the stdlib, e.g., the kernel path must be `object_file`, and the
boot parameter must be `command_line`.
- Use RiscvBootloaderKernelWorkload by default for all full system
RISC-V simulations. RiscvBootloaderKernelWorkload is a superset of
RiscvFsWorkload.
In handleBeginReq, a timing request is sent. If the receiver rejects the
request, the bridge will save the pointers of the original transaction
object and the generated gem5 packet. After a recvReqRetry-signal and a
successful timing request, the variable for transaction object pointer,
but not for the gem5 packet, is set to nullptr. When a new transaction
with the phase BEGIN_REQ arrives, the assertion in handleBeginReq that
there is no pending gem5 packet fails.
Therefore, the variable pendingPacket has to be set to nullptr in
recvReqRetry after a successful timing request too.
Change-Id: I876f8f88e1893e8fdfa3441ed2ae5ddc39cef2ce
Co-authored-by: Robert Hauser <robert.hauser@uni-rostock.de>
Update the RubyCache debug flag prints in CacheMemory to be more
descriptive and make clearer what is happening in a given function.
This makes it easier to determine what is happening when looking at the
RubyCache debug flags prints.
Change-Id: Ieee172b6df0d100f4b1e8fe4bba872fc9cf65854
Previously, the `subprocess` module was used to execute shell command
installing precommit hook. However, after #431 [1], the import of the
`subprocess` module was overriden by `asyncio.subprocess`, which has a
different API to execute the shell command. This change removes the
`asyncio.subprocess` import.
[1] https://github.com/gem5/gem5/pull/431
Change-Id: I9a7d51f85518089d258ab57c5d849a36dcf128e9
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Linux v5.18+ changed the format of one of the data structures used to
implement the printk ring buffer. This caused the gem5 feature to dump
the printk messages on Kernel panic to fail for Kernels 5.18.0 and
later.
This patch updates the printk messages dump feature to support the new
printk ring buffer format when Kernels 5.18 and later are detected.
This patch addresses gem5 Issue #550:
https://github.com/gem5/gem5/issues/550
Change-Id: I533d539eafb83c44eeb4b9fbbcdd9c172fd398e6
Reported-by: Hoa Nguyen <hn@hnpl.org>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This function, `extract_kernel_version`, attempts to find the Kernel
version by searching the `uts_namespace` struct at the exported symbol
`init_uts_ns`. This structure contains the Kernel version as a string,
and is referenced by `procfs` to return the Kernel version in a
running system.
Different versions of the Kernel use different layouts for this
structure, and the top level structure is marked as
`__randomize_layout`, so the exact layout in memory cannot be relied
on. Because of this, `extract_kernel_version` takes the approach of
searching the memory holding the structure for printable strings, and
parsing the version from those strings.
Change-Id: If8b2e81a041af891fd6e56a87346a341df3c9728
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Add a size field to the Symbol objects in the Symbol Table. This
allows client code to read the data associated with a symbol in cases
where the data type/size is not known beforehand (e.g. if an object's
size might be different between different versions of a
workload/kernel).
Access is mediated via the `sizeOrDefault()` method, which requires
client code to specify a fallback size. Since correct size data may
not be available (for example in legacy checkpoints), this forces the
client code to consider the 'missing size data' case.
Change-Id: If1a47463790b25beadf94f84382e3b7392ab2f04
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This commit converts `gem5::loader::Symbol` to a full class with
private members, enforcing encapsulation. Until now client code has
been able to (and does) access members directly.
This change will enable class invariants to be enforced via accessor
methods.
Change-Id: Ia0b5b080d4f656637a211808e13dce1ddca74541
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
The purpose of a `ShadowResource` is a resource which only contains the
ID and Version information, not any additional information about the
resource thus avoiding the `obtain_resource` call.
When attributes of the `ShadowResource` are accessed which can only be
obtained via `obtain_resource` the `ShowResource` calls the function and
returns what is required.
This is useful for `Suite` resources which contain several workloads
and resources which may not all be needed when the `Suite` object is
first instantiated.
Change-Id: Icc56261b2c4d74e4079ee66486ddae677bb35cfa
Update the RubyCache debug flag prints in CacheMemory to be more
descriptive and make clearer what is happening in a given function.
This makes it easier to determine what is happening when looking at the
RubyCache debug flags prints.
Change-Id: Ieee172b6df0d100f4b1e8fe4bba872fc9cf65854
The gem5 standard library hardcoded some parameters of the workload.
E.g., the kernel filename must be `object_file`.
Change-Id: I5eeb7359be399138693eaba0738eaf524c59408f
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
DRAMSys requires cmake 3.24 or greater. By default neither Ubuntu 22.04
or 20.04 delevery this by APT.
In both cases wget is required. In 20.04 OpenSSL is required.
Then ext/dramsys README file has been updated to state this requirement.
This change decouple's the downloading of a resource from it's data.
With this change the `obtain_resource` function returns the
`AbstractResource` implementation which contains the data. The resource
itself (e.g., the actual disk image, binary, file, etc.) is only
downloaded to the host system, if not already present, upon the
`get_local_path` call.
`get_local_path` is the function used by gem5 to ultimately load the
resource into a simulation, therefore this change ensures we only
download resources when they are loaded into a simulation.
This change is not ideal and comes with the following caveats:
1. The `downloader` function is created in `obtain_workload` and passed
to the `AbstractResource` implementation for later use. This function
comes with the following requirements:
* The function will download the resource to `local_path`.
* The function will not re-download the resources if already present
as this function is called _everytime_ `get_local_path` is called.
2. The directories needed to store `local_path` are created in
`obtain_workload` regardless. Ergo even if the resource is not used and
`get_local_path` is never called these directories are still created.
Change-Id: I3f0e9a0099cba946630d719c3d17b7da0bccf74a
This patch adds supports for using the "classic" prefetchers with ruby
cache controllers.
This pull request includes a few commits making the changes in this
order:
- Refactor decouples the classic cache and prefetchers interfaces
- Extras probes for later integration with ruby
- General ruby-side support
- Adds support for the CHI protocol
Commit [mem-ruby: support prefetcher in CHI
protocol](2bdb65653b)
may be used as example on how to add support for other protocols.
JIRA issues that may be related to this pull request:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
The change will only add include and library path if the fastmodel is
required to build. The change will benefit for most of gem5 build.
Change-Id: I98c20bd1470b7227940036199e02bc001e307eac
Currently, if you try to use ctrl-c while python code is running nothing
happens. This is not ideal. This change enables users to use ctrl-c
while python is running (e.g., when a large disk image is downloading).
To do this, we moved the `initSignals` function in gem5 from `main` to
the simulate loop. Thus, every time the simulate loop starts (i.e., is
called from python) gem5 will install its signal handlers. Also, when
the control is returned to python, we put python's default SIGINT
handler back.
Change-Id: I14490e48d931eb316e8c641217bf8d8ddaa340ed
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
DRAMSys requires cmake 3.24 or greater. By default neither Ubuntu 22.04
or 20.04 delevery this by APT.
In both cases wget is required. In 20.04 OpenSSL is required.
Change-Id: I51a7f8a8a46e8cf1908a120adb9289aa3907ccda
- The resources field in workload now changed to a dict of id and
version from a string with just the id.
There was an if statement added to support both versions in develop.
Removing the if statement so that 23.1 supports the new changes only.
Change-Id: Id8dc3f932f53a156e4fb609a215db7d85bd81a44
Converts CLFLUSHOPT/WB/FLUSH operations from Write to Read operations
during address translation so that they don't trigger a page fault when
done on write-protected pages.
Solves #226
Linking the gem5 binary consists of linking numerous object files. By
default, ld optimizes for speed over for memory consumption [1]. This
leads to the huge memory consumption of gem5 at the linking stage.
This patch adds an option to add the `--no-keep-memory` flag to ld.
According to the documentation [1], this flag optimizes the memory usage
of ld.
[1]
https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_2.html#IDX133
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Some variables hava narrow datatypes that overflow on large VLEN values.
For example, the maximum number of microops for LMUL=8 SEW=8 and
VLEN=64K is 2^16.
Change-Id: I5cce759f040884e09ce83bee7e54a62c4b42c5aa
Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
With the use of large RVV vectors (i.e., 8K or 16K bits) and a limited
number of cacheLoadPorts, some loads take multiple cycles to execute.
This triggered certain conditions when store-to-load forwarding happens
in the middle of the execution of a load that already has outstanding
packets.
First, after store-to-load forwarding the request is marked as discarded
and the load is immediately writtenback, which triggers a writebackDone
that tries to delete the request, triggering an assert as it still has
outstanding packets. This patch avoid deleting the request leaving it
self owned, it will be deleted when the last packet arrives in
packetReplied.
Second, this patch avoid checking snoops on discarded requests by
checking if the request exists.
Change-Id: Icea0add0327929d3a6af7e6dd0af9945cb0d0970
Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
This patch adds RubyPrefetcherProxy, which provides means to inject
requests generated by the "classic" prefetchers into a SLICC prefetch
queue. It defines defines notifyPf* functions to be used by protocols
to notify a prefetcher. It also includes the probes required to
interface with the classic implementation.
AbstractController defines the accessor needed to snoop the caches.
A followup patch will add support for RubyPrefetcherProxy in the
CHI protocol.
Related JIRA:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
Additional authors:
Tuan Ta <tuan.ta2@arm.com>
Change-Id: Ie908150b510f951cdd6fd0fd9c95d9760ff70fb0
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
hasBeenPrefetched can now take a requestor id and returns true only if
the block was prefetched by a prefetcher with the same id. This may be
necessary to properly train multiple prefetchers attached to the same
cache. If returns true if the block was prefetched by any prefetcher
when the id is not provided.
Related JIRA:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
Change-Id: I205e000fd5ff100e5a5d24d88bca7c6a46689ab2
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Remove the prefetch_on_access and prefetch_on_pf_hit from BaseCache.
BasePrefetch no longer expects this params to exist in the parent.
Configurations that set these parameter using the cache object were
fixed.
Change-Id: I9ab6a545eaf930ee41ebda74e2b6b8bad0ca35a7
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
This patches decouples the prefetchers from the cache implementation
as the first step to allow using the classic prefetchers with ruby
caches. The prefetchers that need do cache lookups can do so using
the accessor object provided when the probes are notified. This may
also facilitate connecting the same prefetcher to multiple caches.
Related JIRA:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
Change-Id: I4fee1a3613ae009fabf45d7b747e4582cad315ef
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
MOV instructions 8C and 8E can be prefixed with a REX prefix to extend
the source/destination register.
However, the R bit in REX will be applied to the segment register.
The decoder file checks for valid segment registers, checking the
MODRM_REG only, however, later this will be extended with the REX_R when
adding the register to the sources/destinations of the instruction.
This will trigger an assert.
Additionally, MOV instructions of various miscelaneous registers are
also not check for being valid when taking into account the REX_R bit.
This patch checks that the REX_R is not set, otherwise, UD2 will be
generated.
In a high performance CPU there is no other way than a BTB hit
to know about a branch instruction and its type. For low-end CPU's
pre-decoding might sit in from of the BPU to provide this information.
Currently, the BPU models only low-end behavior and updates the
RAS and the indirect branch prediction even without a BTB hit.
This patch adds three things to model the correct behavior for high-end
CPUs.
1. A check before the RAS and indirect predictor wheather there was
a BTB hit or not. Only for BTB hits the BPU will consolidate RAS, and
indirect predictor.
2. Since, this check requires a BTB hit for indirect branches they must
also be installed into the BTB. For returns this was already done.
3. Finally, the BTB update previously happened at squash (decode
or commit). Since this can be out-of-order that means branches from
the false path can get installed without ever been retired.
The memory translation require supervisor mode implement. If the
supervisor mode is not implemented, the satp CSR is not exists and
should not do address translation
Change-Id: Ie6c8a1a130d0aab0647b35e0f731f6b930834176
Converts CLFLUSHOPT/WB/FLUSH operations from Write to Read operations
during address translation so that they don't trigger a page fault
when done on write-protected pages.
Change-Id: I20e89cc0cb2b288b36ba1f0ba39a2e1bf0f728af
The PR contains the following changes:
- Move all of the config options(`env["CONF"]`) from SConsopt to Kconfig
files
- Update `build_opts` files to Kconfig option formats
- The Ruby Protocol files are only built if `RUBY=y`
- Remove the default-default build target
- Kconfig commands are included in the PR:
- defconfig
- setconfig
- meunconfig
- guiconfig
- listnewconfig
- savedefconfig
- oldconfig
- olddefconfig
- Add the `python3-tk` package dependencies
Jira issue: https://gem5.atlassian.net/browse/GEM5-1211
The GPU device keeps a local copy of each ring buffers read pointer
(rptr) to avoid constant DMAs to/from host memory. This means it needs
to be periodically updated on the host side as the driver uses this to
determine how much space is left in the queue and may hang if it believe
the queue is full. For user-mode queues, this already happens when
queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is
never updated leading to a hang.
In this patch the rptr for *all* queues is reported back to the kernel
whenever the queue reaches an empty state (rptr == wptr). Additionally
to handle PM4 queue wrap-around, the queue processing function checks if
the queue is not empty instead of rptr < wptr. This is state because the
driver fills PM4 queues with NOP packets on initialization and when wrap
around occurs.
Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f
#525 Updated DRAMSys to v5.0. This PR further improves v5.0
inforporation into gem5 by better managing its new dependencies and
updating the DRAMSys tests to use v5.0.
This PR:
1. Adds a check which throws warning if DRAMSys cannot be build due to a
missing `cmake` instead of failing with a build error. `cmake` is not a
hard gem5 requirement. It is only required to build DRAMSys in the cases
it is required. It is therefore prudent to not fail a build in cases
`cmake` is not present on the host system.
2. Updates the "all-dependency" Docker images to include the optional
dependencies `git-lfs` (needed to clone the DRAMSys repo when running
the command outlined in ext/dramsys/README -- introduced in #525) and
`cmake` (needed to build DRAMSys).
3. Updates the Weekly workflow's `dramsys-tests`' `Checkout DRAMSys` job
to clone DRAMSys in the same manner as outlined in ext/dramsys/README.
This ensures the `dram-systests` test the instructions we give users.
4. `.gitignore` is added to ext/dramsys to ignore the
ext/dramsys/DRAMSys directory when cloned for building and integration
into gem5.
(2.) Should fix our failing weekly tests:
https://github.com/gem5/gem5/actions/runs/6912511984/job/18808339821 and
(3.) will ensure the changes introduced in #525 are tested.
The V_PERM_B32 instruction is selecting the correct byte, but is
shifting into place moving by bits instead of bytes. The V_OR3_B32
instruction is calling the wrong instruction implementation in the
decoder.
This patch fixes both issues plus a bonus fix for GCN3's V_PERM_B32.
(GCN3 does not have V_OR3_B32).
Change-Id: Ied66c43981bc4236f680db42a9868f760becc284
This PR is fixing remaining issues in the ArmISA::Interrupt class; more
specifically it is enabling
virtual interrupts in secure mode (when FEAT_SEL2 is present). Previous
version was assuming no
virtual interrupt was possible in secure mode. We fix this assumption by
replacing the security check
with the EL2Enabled helper which closely matches the Arm pseudocode
This clone is updated to reflect the new advice given in
ext/dramasys/README that was introduced in PR
https://github.com/gem5/gem5/pull/525 to upgrade DRAMSysm to v5.0.
Change-Id: I868619ecc1a44298dd3885e5719979bdaa24e9c2
'cmake' is required to build DRAMSysm.
This is an optional dependency for compiling DRAMSys. It is therefore
not required. It is included in the "all-dependencies" Docker images
as they may be needed if DRAMSys is desired.
Change-Id: I1a3e1a6fa2da4d0116d423e9267d4d3095000d4e
CMake is not required to build gem5. It is only required to build
and link the optional DRAMSysm library. Therefore, if the DRAMSys repo
has been cloned but CMake is not present this patch ensures no attempt
at building or linking DRAMSysm is made. A warning is thrown
inform the user of the missing CMake.
Change-Id: I4d22e3a16655fd90f6b109b4e75859628f7d532d
This includes the isa and instruction implementations
of mla and mls indexed versions from ARM SVE2 ISA spec.
Change-Id: I4fbd0382f23d8611e46411f74dc991f5a211a313
MOV instructions 8C and 8E can be prefixed with a REX prefix to extend
the source/destination register. However, the R bit in REX will be
applied to the segment register. The decoder file checks for valid
segment registers, checking the MODRM_REG only, however, later this
will be extended with the REX_R when adding the register to the
sources/destinations of the instruction. This will trigger an assert.
This patch checks that the REX_R is not set, otherwise, UD2 will be
generated.
Change-Id: I78a93c35116232fe37e5ec50025e721b8c633c5f
The config option HAVE_CAPSTONE is added in the previous [1] and
the Kconfig options should be sync with it.
[1] https://github.com/gem5/gem5/pull/494
Change-Id: Id83718bc825f53d87d37d6ac930b96371209bdb3
The CL updates the Kconfig:
1. Replace the USE_NULL_ISA with BUILD_ISA
2. The USE_XXX_ISAs are depends on BUILD_ISA
3. If the BUILD_ISA is set, at least one of USE_XXX_ISAs must be set
4. Refactor the USE_KVM option
Change-Id: I2a600dea9fb671263b0191c46c5790ebbe91a7b8
In gem5, there are many equally valid and equally useful top level
targets which the user might want. It no longer makes sense to
arbitrarily pick one to be the default target. It makes sense to force
the user to actually specify what they want, instead of assuming it
must be the ARM debug binary.
There is currently an M5_DEFAULT_BINARY environment variable which
will change what the default binary is, if set. This change leaves
that in place, but removes the default-default, or in other words the
default that is used if M5_DEFAULT_BINARY is not set.
This way if the user knows what default they want, they can specify it
locally in their environment and avoid having to type it over and over
again, but we're not making an arbitrary choice at a more global level
without the context to know what actually makes sense.
Change-Id: I886adb1289b9879d53387250f950909a4809ed8b
These two utilities help update an old config to add settings for new
config options. The difference between them is that oldconfig asks what
new settings you want to use, while olddefconfig automatically picks the
defaults.
Change-Id: Icd3e57f834684e620705beb884faa5b6e2cc7baa
This helper utility lets you save the defconfig which would give rise to
a given config. For instance, you could use menuconfig to set up a
config how you want it with the options you cared about configured, and
then use savedefconfig to save a defconfig of that somewhere to the
side, in the gem5 defconfig directory, etc. Then later, you could use
that defconfig to set up a new build directory with that same config,
even if the kconfig options have changed a little bit since then.
A saved defconfig like that can also be a good way to visually see what
options have been set to something interesting, and an easier way to
pass a config to someone else to use, to put in bug reports, etc.
Change-Id: Ifd344278638c59b48c261b36058832034c009c78
This helper lists config options which are new in the Kconfig and which
are not currently set in the config file.
Change-Id: I0c426d85c0cf0d2bdbac599845669165285a82a0
This little utility lets you set particular values in an existing config
without having to open up the whole menuconfig interface.
Also reorganize things in kconfig.py a little to help share code between
wrappers.
Change-Id: I7cba0c0ef8d318d6c39e49c779ebb2bbdc3d94c8
This will let you specify *any* defconfig file, instead of implicitly
selecting one from the defconfig directory based on the variant name.
Change-Id: I74c981b206849f08e60c2df702c06534c670cc7c
If you call scons with the fist argument set to menuconfig, that means
to run menuconfig on the path following it. Or in other words, if
you ran this command:
scons menuconfig build/foo/bar
That would tell SCons to set up a build directory at the path
build/foo/bar, and then invoke menuconfig so you can set up its
configuration.
In addition to using this mechanism to set up a new build directory, you
can also use it to reconfigure an existing directory.
This supplements and does not replace the existing mechanism of using
"build/${VARIANT}" to select a config with defconfig.
Change-Id: Ief8e8c2ee6477799455c2004bef06c64be5cc1db
These targets are not necessarily obvious, and tell SCons to do useful
things, like build a particular version of the gem5 binary with a
particular configuration, or run the unit tests.
Add descriptions of these targets to the help so that they are much
more discoverable.
Change-Id: If84399be1a7155ff5f66f511efe1f1c241089c84
This root Kconfig file "source"s (includes) the base gem5 src/Kconfig
file, and also any optional Kconfig files found in the base of EXTRAS
directories. These will be called out in the menuconfig interface and
config files with the name of the EXTRAS directory they came from, and a
blank section will be present either if the Kconfig didn't exist, or it
did exist but had no options in it.
Change-Id: I54060d613f0e0ab9372bed37a2fe5849bf5bbcdb
These are not yet consumed by anything, but convert all the settings
from SCons variables to Kconfig variables.
If you have existing SConsopts files which need to be converted, you
should take a look at KCONFIG.md to learn about how kconfig is used in
gem5. You should decide if any variables need to be available to C++ or
kconfig itself, and whether those are options which should be detected
automatically, or should be up to the user. Options which should be
measured automatically should still be in SConsopts files, while user
facing options should be added to new or existing Kconfig files.
Generally, make sure you're storing c++/kconfig visible options in
env['CONF'][...]. Also remove references to sticky_vars since persistent
options should now be handled with kconfig, and export_vars since
everything in env['CONF'] is now exported automatically.
Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which
is still a sticky SCons variable. This is necessary because EXTRAS also
controls what config options exist. If it came from Kconfig itself, then
there would be a circular dependency. This dependency could
theoretically be handled by reparsing the Kconfig when EXTRAS
directories were added or removed, but that would be complicated, and
isn't supported by kconfiglib. It wouldn't be worth the significant
effort it would take to add it, just to use Kconfig more purely.
Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9
If gem5.build already exists within a directory, then that build
directory can be used without having to worry about variants.
If it doesn't exist and we find a build/${VARIANT} style path, then we
use that as the anchor.
In either case, the variant name is the final component of the build
path. The parse_build_path function had been separating that out, but it
was just put back onto the path again anyway by the only caller, and
then split out again when that path was consumed. We save a step by not
splitting it out in parse_build_path.
Change-Id: I8705b3dbb7664748f5525869cb188df70319d403
The AtomicWait event was not being woken up properly due to the
numPending count in the TBE not being decremented. This patch decrements
the count when Data is returned. Since that moves to a base state, the
TBE should no longer be needed.
Additionally added a transition which stalls and wait when an AtomicWait
occurs while in WI state so that it retries.
Change-Id: Ic8bfc700f9df3f95bea0799121898926a23d8163
Converts CLFLUSHOPT/WB/FLUSH operations from Write to Read operations
during address translation so that they don't trigger a page fault
when done on write-protected pages.
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration
The CPU should not sleep with a pending virtual interrupt
if secure mode EL2 is supported (FEAT_SEL2)
Change-Id: Ib71c4a09d76a790331cf6750da45f83694946aee
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Similarly to the physical version [1], we rewrite the
masking logic to account for FEAT_SEL2.
The interrupt table is taken from the Arm architecture reference
manual (version DDI 0487H.a, section D1.3.6, table R_BKHXL)
[1]: https://github.com/gem5/gem5/pull/430
Change-Id: Icb6eb1944d8241293b3ef3c349b20f3981bcc558
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
There is no need to call the methods for every kind
of interrupt. A pending one should short-circuit the
remaining checks
Change-Id: I2c9eb680a7baa4644745b8cbe48183ff6f8e3102
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
With this patch we align virtual interrupts with respect to
the physical ones by introducing a matching takeVirtualInt
method.
Change-Id: Ib7835a21b85e4330ba9f051bc8fed691d6e1382e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Vitual interrupts are enabled in secure mode as well
after the introduction of FEAT_SEL2. Replacing the
secure mode check with the EL2Enabled one
Change-Id: Id685a05d5adfa87b2a366f6be42bf344168927d4
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
https://github.com/gem5/gem5/pull/386 included two cases in
"src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer
of type `size_t` and another of type `Addr`. This causes an error on my
Apple Silicon Mac as the comparison between an "unsigned long" and an
"unsigned long long" is not permitted. To fix this issue this patch
changes `reg_size` from `size_t` to `Addr`, as well as it the types of
the values it was derived from and the variable used to hold the return
from the `std::min` calls. While not completely correct typing from a
labelling perspective (`reg_bytes` is not an address), functions in
"src/dev/reg_bank.hh" already abuse `Addr` in this way frequently (for
example, `bytes` in the `write` function).
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit checkpoints the existing VMID map so that any
new doorbells after restoration use a unique queue ID
Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
https://github.com/gem5/gem5/pull/386 included two cases in
"src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer
of type `size_t` and another of type `Addr`. This cause an error on my
Apple Silicon Mac as this is a comparison between an "unsigned long"
and an "unsigned long long" which (at least on my setup) was not
permitted. To fix this issue the `reg_size` was changed from `size_t` to
`Addr`, as well as it the types of the values it was derived from and
the variable used to hold the return from the `std::min` calls.
Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746
This dockerfile is used to *build* applications (e.g., from
gem5-resources) which can be run using full system mode in a GPU build.
The next releases disk image will use ROCm 5.4.2, therefore bump the
version from 4.2 to that version.
Again this is used to *build* input applications only and is not needed
to run or compile gem5 with GPUFS. For example:
$ docker build -t rocm54-build .
/some/gem5-resources/src/gpu/lulesh$ docker run --rm -u $UID:$GID -v \
${PWD}:${PWD} -w ${PWD} rocm54-build make
Change-Id: If169c8d433afb3044f9b88e883ff3bb2f4bc70d2
When restoring checkpoints for certain applications, gem5 tries to
create new doorbells with a pre-existing queue ID and simulation crashes
shortly after. This commit adds existing IDs to the GPU device's used
VMID map so that new doorbells are aware of existing queue IDs and use a
new ID. This ensures that queue IDs are unique after checkpoint
restoration
Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f
This pull request contains a set of small patches which fix some bugs in
the gem5 prefetchers, and aligns out-of-the box prefetcher performance
more closely with that which a typical user would expect.
The performance patches have been tested with an out-of-the-box
(untuned) Stride prefetcher configuration against a set of SPEC 2017
SimPoints, and show a modest IPC uplift across the board, with no IPC
degradation.
The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
The comp_anr parameter is currently unused. Both parameters (comp_wu and
comp_anr) are set to false by default
Change-Id: If09567504540dbee082191d46fcd53f1363d819f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Currently, the GPU SQC (L1I$) and TCP (L1D$) have a performance bug
where they do not behave correctly when multiple requests to the same
cache line overlap one another. The intended behavior is that if the
first request that arrives at the Ruby code for the SQC/TCP misses, it
should send a request to the GPU TCC (L2$). If any requests to the
same cache line occur while this first request is pending, they should
wait locally at the L1 in the MSHRs (TBEs) until the first request has
returned. At that point they can be serviced, and assuming the line
has not been evicted, they should hit.
For example, in the following test (on 1 GPU thread, in 1 WG):
load Arr[0]
load Arr[1]
load Arr[2]
The expected behavior (confirmed via profiling on real GPUs) is that
we should get 1 miss (Arr[0]) and 2 hits (Arr[1], Arr[2]) for such a
program.
However, the current support in the VIPER SQC/TCP code does not model
this correctly. Instead it lets all 3 concurrent requests go straight
through to the TCC instead of stopping the Arr[1] and Arr[2] requests
locally while Arr[0] is serviced. This causes all 3 requests to be
classified as misses.
To resolve this, this patch adds support into the SQC/TCP code to
prevent subsequent, concurrent requests to a pending cache line from
being
sent in parallel with the original one. To do this, we add an
additional transient state (IV) to indicate that a load is pending to
this cache line. If a subsequent request of any kind to the same cache
line occurs while this load is pending, the requests are put on the
local wait buffer and woken up when the first request returns to the
SQC/TCP. Likewise, when the first load is returned to the SQC/TCP, it
transitions from IV --> V.
As part of this support, additional transitions were also added to
account for corner cases such as what happens when the line is evicted
by another request that maps to the same set index while the first load
is pending (the line is immediately given to the new request, and when
the load returns it completes, wakes up any pending requests to the same
line, but does not attempt to change the state of the line) and how GPU
bypassing loads and stores should interact with the pending requests
(they are forced to wait if they reach the L1 after the pending,
non-bypassing load; but if they reach the L1 before the non-bypassing
load then they make sure not to change the state of the line from IV if
they return before the non-bypassing load).
As part of this change, we also move the MSHR behavior from internally
in the GPUCoalescer for loads to the Ruby code (like all other
requests). This is important to get correct hits and misses in stats
and other prints, since the GPUCoalescer MSHR behavior assumes all
requests serviced out of its MSHR also miss if the original request to
that line missed.
Although the SQC does not support stores, the TCP does. Thus,
we could have applied a similar change to the GPU stores at the TCP.
However, since the TCP support assumes write-through caches and does not
attempt to allocate space in the TCP, we elected not to add this support
since it seems to run contrary to the intended behavior (i.e., the
intended behavior seems to be that writes just bypass the TCP and thus
should not need to wait for another write to the same cache line to
complete).
Additionally, making these changes introduced issues with deadlocks at
the TCC. Specifically, some Pannotia applications have accesses to the
same cache line where some of the accesses are GLC (i.e., they bypass
the GPU L1 cache) and others are non-GLC (i.e., they want to be cached
in the GPU L1 cache). We have support already per CU in the above code.
However, the problem here is that these requests are coming from
different CUs and happening concurrently (seemingly because different
WGs are at different points in the kernel around the same time).
This causes a problem because our support at the TCC for the TBEs
overwrites the information about the GPU bypassing bits (SLC, GLC) every
time. The problem is when the second (non-GLC) load reaches the TCC, it
overwrites the SLC/GLC information for the first (GLC) load. Thus, when
the the first load returns from the directory/memory, it no longer has
the GLC bit set, which causes an assert failure at the TCP.
After talking with other developers, it was decided the best way handle
this and attempt to model real hardware more closely was to move the
point at which requests are put to sleep on the wakeup buffer from the
TCC to the directory. Accordingly, this patch includes support for that
-- now when multiple loads (bypassing or non-bypassing) from different
CUs reach the directory, all but the first one will be forced to wait
there until the first one completes, then will be woken up and
performed. This required updating the WTRequestor information at the
TCC to pass the information about what CU performed the original request
for loads as well (otherwise since the TBE can be updated by multiple
pending loads, we can't tell where to send the final result to). Thus,
I changed the field to be named CURequestor instead of WTRequestor since
it is now used for more than stores. Moreover, I also updated the
directory to take this new field and the GLC information from incoming
TCC requests and then pass that information back to the TCC on the
response -- without doing this, because the TBE can be updated by
multiple pending, concurrent requests we cannot determine if this memory
request was a bypassing or non-bypassing request. Finally, these
changes introduced a lot of additional contention and protocol stalls at
the directory, so this patch converted all directory uses of z_stall to
instead put requests on the wakeup buffer (and wake them up when the
current request completes) instead. Without this, protocol stalls cause
many applications to deadlock at the directory.
However, this exposed another issue at the TCC: other applications
(e.g., HACC) have a mix of atomics and non-atomics to the same cache
line in the same kernel. Since the TCC transitions to the A state when
an atomic arrives. For example, after the first pending load returns to
the TCC from the directory, which causes the TCC state to become V, but
when there are still other pending loads at the TCC. This causes invalid
transition errors at the TCC when those pending loads return, because
the A state thinks they are atomics and decrements the pending atomic
count (plus the loads are never sent to the TCP as returning loads).
This patch fixes this by changing the TCC TBEs to model the number of
pending requests, and not allowing atomics to be issued from the TCC
until all prior, pending non-atomic requests have returned.
Change-Id: I37f8bda9f8277f2355bca5ef3610f6b63ce93563
- resources field in workload now supports a dict with resources id and
version.
- Older workload JSON are still supported but added a deprecation waring
In a high performance CPU there is no other way than a BTB hit
to know about a branch instruction and its type. For low-end CPU's
pre-decoding might sit in from of the BPU to provide this information.
Currently, the BPU models only low-end behavior and updates the
RAS and the indirect branch prediction even without a BTB hit.
This patch adds two things to model the correct behavior for high-end
CPUs.
1. A check before the RAS and indirect predictor wheather there was
a BTB hit or not. Only for BTB hits the BPU will consolidate RAS, and
indirect predictor.
2. Since, this check requires a BTB hit for indirect branches they must
also be installed into the BTB. For returns this was already done.
Change-Id: Ibef9aa890f180efe547c82f41fc71f457c988a89
Signed-off-by: David Schall <david.schall@ed.ac.uk>
This commit optimizes the address generation logic in the strided
prefetcher by introducing the following changes
(d is the degree of the prefetcher)
* Evaluate the fixed prefetch_stride only once (and not d-times)
* Replace 2d multiplications (d * prefetch_stride and distance *
prefetch_stride) with additions by updating the new base prefetch
address while looping
Change-Id: I49c52333fc4c7071ac3d73443f2ae07bfcd5b8e4
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Tiberiu Bucur <tiberiu.bucur@arm.com>
The Stride Prefetcher will skip this number of strides ahead of the
first identified prefetch, then generate `degree` prefetches at
`stride` intervals. A value of zero indicates no skip (i.e. start
prefetching from the next identified prefetch address).
This parameter can be used to increase the timeliness of prefetches by
starting to prefetch far enough ahead of the demand stream to cover
the memory system latency.
[Richard Cooper <richard.cooper@arm.com>:
- Added detail to commit comment and `distance` Param documentation.
- Changed `distance` Param from `Param.Int` to `Param.Unsigned`.
]
Change-Id: I6c4e744079b53a7b804d8eab93b0f07b566f0c08
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Signed-off-by: Richard Cooper <richard.cooper@arm.com>
Currently, the GPU SQC (L1I$) and TCP (L1D$) have a performance bug
where they do not behave correctly when multiple requests to the same
cache line overlap one another. The intended behavior is that if the
first request that arrives at the Ruby code for the SQC/TCP misses, it
should send a request to the GPU TCC (L2$). If any requests to the
same cache line occur while this first request is pending, they should
wait locally at the L1 in the MSHRs (TBEs) until the first request has
returned. At that point they can be serviced, and assuming the line
has not been evicted, they should hit.
For example, in the following test (on 1 GPU thread, in 1 WG):
load Arr[0]
load Arr[1]
load Arr[2]
The expected behavior (confirmed via profiling on real GPUs) is that
we should get 1 miss (Arr[0]) and 2 hits (Arr[1], Arr[2]) for such a
program.
However, the current support in the VIPER SQC/TCP code does not model
this correctly. Instead it lets all 3 concurrent requests go straight
through to the TCC instead of stopping the Arr[1] and Arr[2] requests
locally while Arr[0] is serviced. This causes all 3 requests to be
classified as misses.
To resolve this, this patch adds support into the SQC/TCP code to
prevent subsequent, concurrent requests to a pending cache line from being
sent in parallel with the original one. To do this, we add an
additional transient state (IV) to indicate that a load is pending to
this cache line. If a subsequent request of any kind to the same cache
line occurs while this load is pending, the requests are put on the
local wait buffer and woken up when the first request returns to the
SQC/TCP. Likewise, when the first load is returned to the SQC/TCP, it
transitions from IV --> V.
As part of this support, additional transitions were also added to
account for corner cases such as what happens when the line is evicted
by another request that maps to the same set index while the first load
is pending (the line is immediately given to the new request, and when
the load returns it completes, wakes up any pending requests to the same
line, but does not attempt to change the state of the line) and how GPU
bypassing loads and stores should interact with the pending requests
(they are forced to wait if they reach the L1 after the pending,
non-bypassing load; but if they reach the L1 before the non-bypassing
load then they make sure not to change the state of the line from IV if
they return before the non-bypassing load).
As part of this change, we also move the MSHR behavior from internally
in the GPUCoalescer for loads to the Ruby code (like all other
requests). This is important to get correct hits and misses in stats
and other prints, since the GPUCoalescer MSHR behavior assumes all
requests serviced out of its MSHR also miss if the original request to
that line missed.
Although the SQC does not support stores, the TCP does. Thus,
we could have applied a similar change to the GPU stores at the TCP.
However, since the TCP support assumes write-through caches and does not
attempt to allocate space in the TCP, we elected not to add this support
since it seems to run contrary to the intended behavior (i.e., the
intended behavior seems to be that writes just bypass the TCP and thus
should not need to wait for another write to the same cache line to
complete).
Additionally, making these changes introduced issues with deadlocks at
the TCC. Specifically, some Pannotia applications have accesses to the
same cache line where some of the accesses are GLC (i.e., they bypass
the GPU L1 cache) and others are non-GLC (i.e., they want to be cached
in the GPU L1 cache). We have support already per CU in the above code.
However, the problem here is that these requests are coming from
different CUs and happening concurrently (seemingly because different
WGs are at different points in the kernel around the same time).
This causes a problem because our support at the TCC for the TBEs
overwrites the information about the GPU bypassing bits (SLC, GLC) every
time. The problem is when the second (non-GLC) load reaches the TCC, it
overwrites the SLC/GLC information for the first (GLC) load. Thus, when
the the first load returns from the directory/memory, it no longer has
the GLC bit set, which causes an assert failure at the TCP.
After talking with other developers, it was decided the best way handle
this and attempt to model real hardware more closely was to move the
point at which requests are put to sleep on the wakeup buffer from the
TCC to the directory. Accordingly, this patch includes support for that
-- now when multiple loads (bypassing or non-bypassing) from different
CUs reach the directory, all but the first one will be forced to wait
there until the first one completes, then will be woken up and
performed. This required updating the WTRequestor information at the
TCC to pass the information about what CU performed the original request
for loads as well (otherwise since the TBE can be updated by multiple
pending loads, we can't tell where to send the final result to). Thus,
I changed the field to be named CURequestor instead of WTRequestor since
it is now used for more than stores. Moreover, I also updated the
directory to take this new field and the GLC information from incoming
TCC requests and then pass that information back to the TCC on the
response -- without doing this, because the TBE can be updated by
multiple pending, concurrent requests we cannot determine if this memory
request was a bypassing or non-bypassing request. Finally, these
changes introduced a lot of additional contention and protocol stalls at
the directory, so this patch converted all directory uses of z_stall to
instead put requests on the wakeup buffer (and wake them up when the
current request completes) instead. Without this, protocol stalls cause
many applications to deadlock at the directory.
However, this exposed another issue at the TCC: other applications
(e.g., HACC) have a mix of atomics and non-atomics to the same cache
line in the same kernel. Since the TCC transitions to the A state when
an atomic arrives. For example, after the first pending load returns to
the TCC from the directory, which causes the TCC state to become V, but
when there are still other pending loads at the TCC. This causes invalid
transition errors at the TCC when those pending loads return, because
the A state thinks they are atomics and decrements the pending atomic
count (plus the loads are never sent to the TCP as returning loads).
This patch fixes this by changing the TCC TBEs to model the number of
pending requests, and not allowing atomics to be issued from the TCC
until all prior, pending non-atomic requests have returned.
Change-Id: I37f8bda9f8277f2355bca5ef3610f6b63ce93563
The current GPU TCP (L1D$) Ruby SLICC code had a bug where a GPU
load that wants to bypass the L1D$ (e.g., GLC or SLC bit was set)
but the line is in Invalid when that request arrives, results in
a non-bypassing load being sent to the GPU TCC (L2$) instead of
a bypassing load.
This issue was not caught by currently nightly or weekly tests,
because the tests do not test for correctness in terms of hits
and misses in the caches. However, tests for these corner cases
expose this issue.
To fix, this, this patch removes the check that the entry is valid
when deciding what to do with a bypassing GPU load -- since the
TCP Ruby code has transitions for bypassing loads in both I and V,
we can simply call the LoadBypassEvict event in both cases and the
appropriate transition will handle the bypassing load given the
cache line's current state in the TCP.
Change-Id: Ia224cefdf56b4318b2bcbd0bed995fc8d3b62a14
Print extra logs for the full/partial read/write access to the registers
through the register bank. The debug flag is empty by default and would
not print anything.
Test: run unittest of dev/reg_bank.test.xml to check the behavior would
not affect the original functionality.
run gem5 with debug flags and use m5term to poke on registers.
mtvec.mode is extended in the new riscv proposal, like fast interrupt.
This change moves that part from Fault class to ISA class for
extendable.
Ref: https://github.com/riscv/riscv-fast-interrupt
Since returned data is not needed for AtomicNoReturn and Store memory
requests, the coalescer need not spend time writing in dummy data for
packets of these types.
Change-Id: Ie669e8c2a3bf44b5b0c290f62c49c5d4876a9a6a
I believe the weekly test failures (example:
https://github.com/gem5/gem5/actions/runs/6832805510/job/18592876184)
are due to a container running out of memory when running the very-long
x86 boot tests. I found that the `-t $(nproc)` flag meant, on our
runners, 4 x86 full system gem5 simulations were being pawned. Locally I
found these gem5 x86 boot sims can reach 4GB in size so I suspect they
eventually grew big enough exceed the 16GB memory of the VM.
I have removed `-t $(nproc)` meaning each execution to see if this fixes
the issue (we may want to use `-t 2` later if the Weeklies take too long
running single-threaded).
Recent breaking changes in the DRAMSys API require user code to be
updated. These updates have been applied to the gem5 integration.
Furthermore, as DRAMSys started to use CMake dependency management,
it is no longer sensible to maintain two separate build systems for
DRAMSys. The use of the DRAMSys integration in gem5 will therefore
from now on require that CMake is installed on the target machine.
Additionally, support for snapshots have been implemented into DRAMSys
and coupled with gem5's checkpointing API.
This commit fixes a violation of the TLM2.0 protocol as well as a
bug regarding back-pressure:
- In the BEGIN_REQ phase, transaction objects are required to set
their response status to TLM_INCOMPLETE_RESPONSE. This was not
the case in the packet2payload function that converts gem5 packets
to TLM2.0 generic payloads.
- When the target applies back-pressure to the initiator, an assert
condition was triggered as soon as the response is retried. The
cause of this was an unintentional nullptr-access into a map.
Augmenting Datablock and WriteMask to support optional arg to
distinguish between return and no return. In the case of atomic no
return requests, log entries should not be created when performing the
atomic.
Change-Id: Ic3112834742f4058a7aa155d25ccc4c014b60199a
Added a resource constraint, AtomicALUOperation, to GLC atomics
performed in the TCC.
The resource constraint uses a new class, ALUFreeList array. The class
assumes the following:
- There are a fixed number of atomic ALU pipelines
- While a new cache line can be processed in each pipeline each cycle,
if a cache line is currently going through a pipeline, it can't be
processed again until it's finished
Two configuration parameters have been used to tune this behavior:
- tcc-num-atomic-alus corresponds to the number of atomic ALU pipelines
- atomic-alu-latency corresponds to the latency of atomic ALU pipelines
Change-Id: I25bdde7dafc3877590bb6536efdf57b8c540a939
pkt->req->isCacheMaintenance() would not include a check
for clean eviction before notifying the prefetcher,
causing gem5 to crash.
Change-Id: I1d082a87a3908b1ed46c5d632d45d8b09950b382
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Update the default prefetch options to achieve out-of-the box
prefetcher performance closer to that which a typical user would
expect. Configurations that set these parameters explicitly will be
unaffected.
The new defaults were identified as part of work on gem5 prefetchers
undertaken by Nikolaos Kyparissas while on internship at Arm.
Change-Id: Id63868c7c8f00ee15a0b09a6550780a45ae67e55
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Prefetch queue entries were being squashed by comparing the address
of each queued prefetch against the block address of the demand
access. Only prefetches that happen to fall on a cache-line block
boundary would be squashed.
This patch converts the prefetch addresses to block addresses before
comparison.
Change-Id: I55ecb4919e94ad314b91c7795bba257c550b1528
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
While we do run compiler tests weekly, 9/10 the issue is a strict check
in clang we did not check before incorporating code into the codebase.
Therefore, running a clang compilation as part of our CI would help us
catch errors quicker.
This reverts gem5#133, the temporary work-around for gem5#131, allowing
both SLC and GLC atomic requests to be made in the GPU tester.
The underlying issues behind gem5#131 have been resolved by gem5#367 and
gem5#397.
The decode_inst_dep_trace.py opens the trace file in read mode, and
subsequently reads the magic number from the trace file. Once the number
is read, it is compared against the string 'gem5' without decoding it
first. This causes the comparison to fail.
The fix addresses this by calling the decode() routine on the output of
the read() call. Please find the details in the associated issue #543
When compiling GCC-9 gem5 the gem5 object files are near double the size
than when compiling with other GCC versions. This increase in size means
we need >16GB of memory available when linking. As we do not want to
mandate >16GB systems for building gem5, we are going to drop GCC-9. The
exact cause of this bug unknown. This is highlighted in Issue #555.
mem-ruby, gpu-compute: fix formatting of TCC
Fix several not properly indented prints and extraneous extra lines in
the SLICC code for the GPU TCC (L2 cache).
mem-ruby, gpu-compute: fix typo in GPU coalescer deadlock print
The GPU Coalescer's deadlock print did not previously print a newline at
the end of each deadlock, which caused confusion when there were
multiple deadlocks as each deadlock print would appear to go with the
address after it. This patch fixes this issue.
gpu-compute: Fix typo with GPUTLB print
Print was not properly ending in a newline, which caused confusion when
looking a trace with GPUTLB enabled. This fixes that.
mem-ruby, gpu-compute: fix GPU SQC/TCP Ruby formatting
Fix several not properly indented prints and extraneous extra lines in
the SLICC code for the GPU SQC (L1I$) and TCP (L1D$).
Symbol type is part of the info provided by an ELF object's symtab.
It indicates whether a symbol is a file symbol, or a function symbol,
etc.
This chain of commits introduces a way to only load function symbols
to the gem5's symbol table. The RISC-V BootloaderKernelWorkload now
loads only function symbols from the bootloader and the kernel binaries
by default.
arch-riscv: Fix implementation of CMO extension instructions
This change introduces a template for store instruction's mem access.
The new template is called CacheBlockBasedStore.
The reasons for not reusing the current Store's mem access template
are as follows,
- The CMO extension instructions operate on cache block size
granularity,
while regular load/store instructions operate on data of size 64 bits or
fewer.
- The writeMemAtomicLE/writeMemTimingLE interfaces do not allow passing
nullptr as data. However, CPUs in gem5 rely on (data == NULL) to detect
CACHE_BLOCK_ZERO instructions. Setting `Mem = 0;` to `uint64_t Mem;`
does not solve the problem as the reference is allocated and thus,
it's always true that `&Mem != NULL`. This change uses the
writeMemAtomic/writeMemTiming interfaces instead.
- Per CMO v1.0.1, the instructions in the spec do not generate
address misaligned faults.
- The CMO extension instructions do not use IMM.
---
arch-riscv: Fix generateDisassembly for Store with 1 source reg
Currently, store instructions are assumed to have two source registers.
However, since we are supporting the RISC-V CMO instructions, which
are Store instructions in gem5 but they only have one source register.
This change allows printing disassembly of Store instructions with
one source register.
---
arch-riscv: Make Zicbom/Zicboz extensions optional in FS mode
Currently, we're enable Zicbom/Zicboz by default. Since those
extensions might be buggy as they are not well-tested, making
those entensions optional allows running simulation where
the performance implication of the instructions do not matter.
Effectively, by turning off the extensions, we simply remove
those extensions from the device tree, so the OS would not
use them. It doesn't prohibit the userspace application to
use those instructions, however.
---
arch-riscv: Add all supporting Z extensions to RISC-V isa string
When compiling GCC-9 gem5 the gem5 object files are near double the size
than when compiling with other GCC versions. This increase in size means
we need >16GB of memory available when linking. As we do not want to
mandate >16GB systems for building gem5, we are going to drop GCC-9. The
exact cause of this bug unknown.
Change-Id: I43744d421b88b79ccb21a76badd6b525e894e973
mem-ruby: update RubyRequest print to include GPU fields
The print function used for RubyRequests did not include the GPU
specific fields (for the GLC and SLC bits, which are cache modifiers
that specify what level of the memory hierarchy a request should be
performed at). This causes confusion when the GPU Ruby SLICC code prints
out RubyRequest messages, since important fields are missing.
Thus this commit adds that support. Since these fields are already part
of the RubyRequest class, and are always 0 for non-GPU requests, it
should not affect other components beyond slightly longer prints.
Change-Id: I31c9122b82dfa2c6415ce25d225ea82cb35c7333
Made the following changes to fix the behavior of GLC atomics in a WB
L2:
- Stored atomic write mask in TBE For GLC atomics on an invalid line
that bypass to the directory, but have their atomics performed on the
return path.
- Replaced !presentOrAvail() check for bypassing atomics to directory
(which will then be performed on return path), with check for invalid
line state.
- Replaced wdb_writeDirtyBytes action used when performing atomics with
owm_orWriteMask action that doesn't write from invalid atomic request
data block
- Fixed atomic return path actions
Change-Id: I6a406c313d2f9c88cd75bfe39187ef94ce84098f
This change updates the gem5 SST bridge to call m5.instantiate() in the
gem5 config script instead of in the SST component. This allows more
flexibility for the gem5-SST setup, as we can now write traffic
generators using the bridge.
Previously the GPU L1 I$ (SQC) was not updating the MRU information on
hits in the SQC. This commit resolves that by adding support to the
appropriate Ruby transition.
Previously, opening a config file (such as
`configs/example/hmc_hello.py`) containing non-ASCII characters causes
UnicodeDecodeError.
Also switch to use more an more idiomatic context manager for handling
files.
Change-Id: Ia39cbe2c420e9c94f3a84af459b7e5f4d9718d14
An integer division in the compression:Base:getSize() was being done,
which led to rounding down instead of up.
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
The print function used for RubyRequests did not include the GPU
specific fields (for the GLC and SLC bits, which are cache modifiers
that specify what level of the memory hierarchy a request should be
performed at). This causes confusion when the GPU Ruby SLICC code
prints out RubyRequest messages, since important fields are missing.
Thus this commit adds that support. Since these fields are already
part of the RubyRequest class, and are always 0 for non-GPU requests,
it should not affect other components beyond slightly longer prints.
Change-Id: I31c9122b82dfa2c6415ce25d225ea82cb35c7333
SST SimpleMem will be deprecated in SST 14. PR 396 updated the
bridge to use StandardMem, which is the new memory interface in
SST. This change removes all references to SimpleMem.
Change-Id: I6e4d645317d95ebb610e3dfc93a30d53b91b6b5d
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
This change updates the gem5 SST bridge to call m5.instantiate()
in the gem5 config script instead of in the SST component. This
allows more flexibility for the gem5-SST setup, as we can now write
traffic generators using the bridge.
Change-Id: I510a8c15f8fb00bdbdd60dafa2d9f5ad011e48f2
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
- resources field in workload now supports a dict with resources id
and version.
- Older workload JSON are still supported but added a deprecation waring
Change-Id: I137dbb99799a5294e84ce7d5d914f05e4cfe9e00
GPR allocation is using fields in the AMD kernel code structure which
are not backwards compatible and are not populated in more recent
compiler versions. Use the granulated fields instead which is enfored to
be backwards compatible.
Change-Id: I718716226f5dbeb08369d5365d5e85b029027932
This fixes occasional readBlob fatals caused by the functional read of
system memory, seen often with the KVM CPU.
Change-Id: Ifccee666f62faa5b2fcf0a64a9d77c8cf95b3add
The amdgpu driver can, at *any* time, tell the device to unmap a queue
to force the queue descriptor to be written back to main memory in the
form of a memory queue descriptor (MQD). It will then immediately remap
the queue and continue writing the doorbell to the queue. It is possible
that the doorbell write occurs after the queue is unmapped but before it
is remapped. In this situation, we need to check the updated value of
the doorbell for the queue and write that to the queue after it is
mapped.
To handle this, a pending doorbell packet map is created to hold a
packet to replay when the queue is mapped. Because PCI in gem5
implements only the atomic protocol port, we cannot use the original
packet as it must respond in the same Tick. This patch fixes issues with
the doorbell maps not being cleared on unmapping to ensure the doorbell
is not found in writeDoorbell and places in the pending doorbell map.
This includes fixing the doorbell offset value in the doorbell to VMID
map which was is now multiplied by four as it is a dword address.
This was tested using tensorflow 2.0's MNIST example which was seeing
this issue consistently. With this patch it now makes progress and does
issue pending doorbell writes.
Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e
gem5 does not currently implement any vendor-specific HSA packets.
Starting in ROCm 5.5, vendor packets appear to end with a completion
signal. Not sending this completion causes gem5 to hang. Since these
packets are not documented anywhere and need to be reverse engineered we
send the completion signal, if non-zero, and finish the packet as is the
current behavior.
Testing: HIP examples working on most recent ROCm release (5.7.1).
Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c
Currently, we're enable Zicbom/Zicboz by default. Since those
extensions might be buggy as they are not well-tested, making
those entensions optional allows running simulation where
the performance implication of the instructions do not matter.
Effectively, by turning off the extensions, we simply remove
those extensions from the device tree, so the OS would not
use them. It doesn't prohibit the userspace application to
use those instructions, however.
Change-Id: Ib30e98c4c39f741dec5f7d31bd7b832391686840
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, store instructions are assumed to have two source registers.
However, since we are supporting the RISC-V CMO instructions, which
are Store instructions in gem5 but they only have one source register.
This change allows printing disassembly of Store instructions with
one source register.
Change-Id: I4dd7818c9ac8a89d5e10e77db72248942a25e938
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This change introduces a template for store instruction's mem access.
The new template is called CacheBlockBasedStore.
The reasons for not reusing the current Store's mem access template
are as follows,
- The CMO extension instructions operate on cache block size granularity,
while regular load/store instructions operate on data of size 64 bits or
fewer.
- The writeMemAtomicLE/writeMemTimingLE interfaces do not allow passing
nullptr as data. However, CPUs in gem5 rely on (data == NULL) to detect
CACHE_BLOCK_ZERO instructions. Setting `Mem = 0;` to `uint64_t Mem;`
does not solve the problem as the reference is allocated and thus,
it's always true that `&Mem != NULL`. This change uses the
writeMemAtomic/writeMemTiming interfaces instead.
- Per CMO v1.0.1, the instructions in the spec do not generate
address misaligned faults.
- The CMO extension instructions do not use IMM.
Change-Id: I323615639a4ba882fe40a55ed32c7632e0251421
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Symbol type is part of the info provided by an ELF object's symtab.
It indicates whether a symbol is a file symbol, or a function symbol, etc.
Change-Id: I827e79f8439c47ac9e889734aaf354c653aff530
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, we are hardcoding the ISA string in the device tree
generator. The ISA string from the device tree affects which
ISA extensions will be used by the bootloader/kernel.
This function allows generating the ISA string from the gem5's
ISA object rather than using hardcoded values.
This series of changes also correct a couple of hardcoded
RISC-V ISA strings in the standard library, as well as not
enable RVV instructions for the U74 core model.
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This PR is fixing https://github.com/gem5/gem5/issues/449 by applying
the following changes
1) Setting up alloc_on_atomic=False in the stdlib
This is directly related to the error message reported by the Issue #449
2) Disabling far atomics in stdlib with policy type = 0
There is an invalid transaction error, likely caused by the fact the
current implementation
is expecting a 2 level cache hierarchy whereas the stdlib example only
allocates one
level of caches (L1). This needs further investigation
3) Explicitly clearing the atomic log
Even by disabling far atomics, the execution of atomicPartial was
populating
the atomic log queue without ever clearing it. This caused the OOM
killer in Linux
to detect the leak and to kill it when the physical resources of the
machine no longer
sufficed. IMHO the atomic log interface should be revamped as atomic
users should
be allocating the atomic log only if explicitly needed
Currently, the kernel's symbols are shifted by `kernel_paddr_offset`,
which is where the kernel is located in the physcial address space.
However, the symbols are mapped to virtual addresses, which stay the
same even though the physical address space is shifted.
This patch removes the offset for the kernel's symbols virtual
addresses.
Change-Id: I7c35f925777220f56bd8c69bba14c267d2048ade
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
As pointed out by [1], Arm doesn't seem to respect the cacheability
attribute when mapping uncacheable memory. This is because the request
is not tagged as uncacheable during SE translation With this patch we
are checking for the cacheability attribute before finalizing
translation
[1]: https://github.com/gem5/gem5/issues/509
Change-Id: I42df0e119af61763971d5766ae764a540055781b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is a temporary solution to fix daily tests. We could revert
to the default (policy_type = 1) once the problem is properly
fixed
Change-Id: Ia80af9a7d84d5c777ddeb441110a91a1680c1030
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The new far atomics implementation [1] didn't take into consideration
it was supposed to manually clear the atomic log. This caused a
memory leak where the log queue was getting bigger and bigger
as no cleaning was happening
[1]: https://github.com/gem5/gem5/pull/177
Change-Id: I4a74fbf15d21e35caec69c29117e2d98cc86d5ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
gem5 will otherwise fatal with the error message:
fatal: ... alloc_on_atomic without default or user set value
See github issue [1] for further details
[1]: https://github.com/gem5/gem5/issues/449
Change-Id: I0bb8fccf0ac6d60fc6c1229436a35e91b2fb45cd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Major refactoring of the branch predictor unit.
- Clearer control flow of the main branch predictor
- Remove `uncondBranch` and `btbUpdate` functions in favor
of a common `historyUpdate` function. There is now only
one lookup function for conditional branches and the new
`historyUpdate` for speculative history update.
- Added a new *target provider* class.
- More expressive statistics depending on the different branch
types.
- Cleanup the branch history management
Current hardcoded value does not support vector instructions.
The new ISA string generator function allows the flexibility
of using or not using the vector extension.
Change-Id: Ic78c4b6629ad3813fc172f700d77ea956552e613
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, we are hardcoding the ISA string in the device tree
generator. The ISA string from the device tree affects which
ISA extensions will be used by the bootloader/kernel.
This function allows generating the ISA string from the gem5's
ISA object rather than using hardcoded values.
Change-Id: I2f3720fb6da24347f38f26d9a49939484b11d3bb
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Unfortunately Actions uses docker contaienrs to create files on the
system with root permissions. The 'vagrant' user which we login to run
the Actions Runner, can't remove these files. However, 'vagrant' is part
of the sudo group and can therefore use sudo to remove these files.
I don't like this, but it works.
Major refactoring of the branch predictor unit.
- Clearer control flow of the main branch predictor
- Remove `uncondBranch` and `btbUpdate` functions in favour of a
common `historyUpdate` function. There is now only one lookup
function for conditional branches and the new `historyUpdate` for
speculative history update.
- Added a new *target provider* class.
- More expressive statistics depending on the different branch types.
- Cleanup the branch history management
Change-Id: I21fa555b5663e4abad7c836fc1d41a9c8b205263
Signed-off-by: David Schall <david.schall@ed.ac.uk>
Some debug registers were incorrectly tagged
(e.g. as being writeable). This was causing a bug in some gem5-KVM runs
where gem5 was trying to initialize the state of those registers
(OSLSR_EL1) [1] but KVM was returning an error (as the registers were
RO).
[1]: https://github.com/gem5/gem5/blob/stable/\
src/arch/arm/kvm/armv8_cpu.cc#L408
Capstone is an open source disassembler [1] already used by
other projects (like QEMU).
gem5 is already capable of disassembling instructions. Every StaticInst
is supposed to define a generateDisassembly method which returns the
instruction mnemonic (opcode + operand list) as a string.
This "distributed" implementation of a disassembler relies
on the developer to properly populate the metadata fields
of the base instruction class.
The growing complexity of the ISA code and the massive reuse
of base classes beyond their intended use has led to a
disassembling logic which contains several bugs.
By allowing a tracer to rely on a third party disassembler, we fill the
instruction trace with a more trustworthy instruction stream.
This will make any trace parsing tool to work better and it will
also allow us to spot/fix our own bugs by comparing instruction
traces with native vs custom disassembler
[1]: http://www.capstone-engine.org/
Capstone is an open source disassembler [1] already used by
other projects (like QEMU).
gem5 is already capable of disassembling instructions. Every StaticInst
is supposed to define a generateDisassembly method which returns the
instruction mnemonic (opcode + operand list) as a string.
This "distributed" implementation of a disassembler relies
on the developer to properly populate the metadata fields
of the base instruction class.
The growing complexity of the ISA code and the massive reuse
of base classes beyond their intended use has led to a
disassembling logic which contains several bugs.
By allowing a tracer to rely on a third party disassembler, we fill the
intruction trace with a more trustworthy instruction stream.
This will make any trace parsing tool to work better and it will
also allow us to spot/fix our own bugs by comparing instruction
traces with native vs custom disassembler
[1]: http://www.capstone-engine.org/
Change-Id: I3c4db5072c03d2731265d0398d3863c101dcb180
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We move it to the child class which is what the TarmacTracer
actually uses.
Change-Id: Ia30892723d2e1f7306dae87c6c9c1d69d00ad73d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We want to be able to configure from python the disassembler
used by an instruction tracer. The default/base version will
reuse existing instruction logic and it will simply
call the StaticInst::disassemble method.
Change-Id: Ieb16f059a436757c5892dcc82882f6d42090927f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The scons function Detect will return the program name if the program
is exists in the system. However, the HAVE_PKG_CONFIG is used to
check the pkg-config program is exists and it should be the boolean
type.
Change-Id: I18c4813d36eea68b8851a41db41777bdb2a80b7b
Currently the type of HAVE_DEPRECATED_NAMESPACE is used to detect
if the compiler support gnu::deprecated feature. The return type
of conf.TryCompile is int, but HAVE_DEPRECATED_NAMESPACE is used
as boolean type. The CL is add bool type caster to ensure the type
of it is boolean.
Change-Id: Ife7d9716e485a8be8722d58776f064e7c2268a30
At this moment, VLEN and ELEN RVV parameters are set as constants that
need to be modified at compile time if you want to experiment with
different values. With this patch, I want to set a first point to
discuss how to configure these parameters dynamically.
Also, I have modified some data types that were provoking wrong
behaviour in particular instructions when using a large enough VLEN
value in the considered range inside the specification.
Adds the LULESH GPU Tests to our GitHub Actions infrastructure
Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Co-authored-by: Harshil Patel <harshilp2107@gmail.com>
This PR aims to enhance our Docker image build and registry management
by implementing multi-platform support and migrating from the Google
Docker registry to the GitHub Container Registry. Issue:
[#336](https://github.com/gem5/gem5/issues/336)
This patch fixes the size of the memory acceses in vswhole and
vlwhole instructions to the maximum vector length.
Change-Id: Ib86b5356d9f1dfa277cb4b367893e3b08242f93e
This patch adds elen as a member of vector configuration instructions so it can be used with the especulative execution
Change-Id: Iaf79015717a006374c5198aaa36e050edde40cee
In first place, vlen is added as a member of Vector Macro Instructions
where it is needed to split the instruction in Micro Instructions.
Then, new PCState methods are used to get dynamic vlen and vlenb
values at execution.
Finally, vector length data types are fixed to 32 bits so every vlen value
is considered.
Change-Id: I5b8ceb0d291f456a30a4b0ae2f58601231d33a7a
This patch add vlen definition to the riscv decoder so it can be used in Vector Instruction Constructors
Change-Id: I52292bc261c43562b690062b16d2b323675c2fe0
This path redefine VecRegContainer for RISCV so it can hold every VLEN + ELEN possible configuration used at execution time
Change-Id: Ie6abd01a1c4ebe9aae3d93f4e835fcfdc4a82dcd
Adds the following hooks:
1. `check-ast`: Verifies all Python files have a AST indicating they are
valid Python.
2. `check-merge-conflict`: Checks to see if files have merge conflict
strings and blocks commits if so.
3. `check-symlinks`: Checks that symlinks in the repo still point to a
valid location.
4. `destroyed-symlinks`: Checks if symlink is replaced with a file and
if that file is identical to the file it was previously pointing too.
None of these commits change any code. They are all checks to ensure bad
code is not committed.
Misc Regs might contain rather important information about the state of
a core, e.g., information in CSR registers.
This patch enforces copying the CSR registers when switching cpus. The
bug and the proposed fix are reported here [1].
[1] https://github.com/gem5/gem5/issues/451
Change-Id: I611782e6e3bcd5530ddac346342a9e0e44b0f757
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This PR adds the following the GitHub Actions runners:
1. Enables KVM to be run within docker containers within the VMs, if
permitted. Now, any Docker containers wanting to use KVM must create
containers with the `--device /dev/kvm` argument. This may make it hard
or impossible to utilize with GitHub Actions. Nonetheless it is enabled.
2. Improves the docker prune step (a cleanup step carried out after each
job) so it now removes all the Docker images in the VM.
3. Adds the "halt-helper.sh" script which automatically, and safely,
halts (shutsdown) all the VMs so maintenance tasks can be undertaken.
The return address stack (RAS) is restructured to be a separate SimObject.
This enables disabling the RAS and better separation of the functionality.
Furthermore, easier statistics and debugging.
Change-Id: I8aacf7d4c8e308165d0e7e15bc5a5d0df77f8192
Signed-off-by: David Schall <david.schall@ed.ac.uk>
This patch:
1. Adds setup scripting to "provision_root.sh" to setup and enable KVM,
for the 'vagrant' user, for VMs which are capable of this.
2. Runs a check on each VM to see if KVM can be run sucessfully within a
docker container. If so, the GitHub Actions runner is given a 'kvm'
label. It is unknown at this time if GitHub Runners can utlized KVM
but it is open to their processes.
Change-Id: Idfcbb7bfa3e5b7cc47d29aea50fb1ebcafdb7acc
Without `--all` `docker prune --force --volumes` will remove everything
exception non-dangling images. For an image to be considered dangling it
must be untagged and/or not used by a container at that time. As most of
the images we download are tagged (e.g., `:latest`) then most of our
images are never removed without the inclusion of `--all` which will
remove any image not currently used by a container.
Images were starting to accumulate on runners. This will ensure they do
not and are cleaned after each job run.
Change-Id: I6d8441a11d22fdcf827e9c44422dbcf02cf600e0
This is similar to [1] and [2].
Essentially, the VS bits of STATUS CSR keep track of the state of
the vector registers. (VS bits == DIRTY) means the content of vector
registers have been updated since the last time the VS bits were
updated.
This chain of changes is supposed to change the VS bits to DIRTY for if
any
vector register is potentially updated.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/pull/370
Change-Id: I0427890dadc63b74a470d7405807dcfcad18005b
In commit bbc301f2f0 the generalized
`Exception` was changed back to the more specific `HTTPError`.
In this case we do not desire specific error handling. If the connection
to the database fails I want the exception handled in the way outlined:
i.e., i want the connection to be retried 4 times before giving up. With
`HTTPError`, only `HTTPError`s warrent a retry.
Changing this to `HTTPError` cause tests to fail due to a failure to
retry downloading of a resource. Here is an example:
https://github.com/gem5/gem5/actions/runs/6521543885/job/17710779784
In this case `request.urlopen` raised a `URLError`. I suspect this was
some issued to do with reaching the DNS servers. It likely would've
succeeded if it had just tried again.
THe `lvextend` command extends the logical volume. However, the
`resize2fs` command is needed to extend the filesystem to fill the
logical volume.
Prior to this patch the filesystem ran out of space despite there being
enough room in the volume. This was just wasted free space.
This only applies to pseudo instructions with their own encoding (m5
ops)... In other
words memory mapped m5 operations are not supported. This make sense as
they should
rather be treated as device accesses... Though it is something to take
into consideration
when relying on the flag
While it's plausible to define the cache_line_size as a 32-bit unsigned
int, the use of cache_line_size is way out of its original scope.
cache_line_size has been used to produce an address mask, which masking
out the offset bits from an address. For example, [1], [2], [3], and
[4]. However, since the cache_line_size is an "unsigned int", the type
of the value is not guaranteed to be 64-bit long. Subsequently, the bit
twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask, i.e.,
0x00000000FFFFFFC0.
This behavior at least caused a problem in LLSC in RISC-V [5], where the
load reservation (LR) relies on the mask to produce the cache block
address. Two distinct 64-bit addresses can be mapped to the same cache
block using the above mask.
This patch explicitly defines cache_line_size as a 64-bit unsigned int
so the cache block mask can be produced correctly for 64-bit addresses.
[1]
3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)
[2]
3bdcfd6f7a/src/cpu/simple/timing.hh (L224)
[3]
3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)
[4]
3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)
[5]
3bdcfd6f7a/src/arch/riscv/isa.cc (L787)
Being able to recognise pseudo ops from the static instruction
pointer is actually quite useful in several circumstances
Change-Id: Ib39badf9aabba15ab3ebe7a8e9717583412731e4
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
With -m, you can now run a module from the command line that is embedded
in the gem5 binary.
This will allow us to put some common "scripts" in the stdlib instead of
in the "configs" directory.
PR #367 adds an option to configs/ruby/GPU_VIPER.py that was not added
to the corresponding dGPU equal for GPUFS and thus all GPUFS runs are
failing. Fixed in this patch.
Added checks to ensure that atomics are not performed in the TCC when it
is configured as a write-through cache. Also added SLC bit overwrite to
ensure directory preforms atomics when there is a write-through TCC.
Change-Id: I4514e6c8022aeb7785f2c59871cd9acec8161ed8
After removing the setRegOperand in VecRegOperand
https://github.com/gem5/gem5/pull/341. The vmask_vm_micro will not write
back to register because tmp_d0 is not the reference type. The PR will
make tmp_d0 as reference of regFile.
Change-Id: I2a934ad28045ac63950d4e2ed3eecc4a7d137919
This PR updates cache recorder to use a vector of RubyPorts for cache
cooldown and warmup instead of Sequencer or GPUCoalescer vectors (refer
to issue #403 for more details). It also removes the extra guards that
were added in #377 to prevent compile-time failures in non-GPU builds.
PR https://github.com/gem5/gem5/pull/396 updates the gem5 SST bridge to
use StandardMem in SST. This change updates the nightly tests to use SST
13.0.0 instead of SST 11.1.0. It also updates the dockerfile.
Change-Id: I5c109c40379d2f09601a1c9f19c51dd716c6582e
---------
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
ThumbEE had already been removed but there were still some
references to it dangling around. We were also signaling
ThumbEE as being available through HWCAPS in SE which
was not correct. This patch is fixing it
Change-Id: I8b196f5bd27822cd4dd8b3ab3ad9f12a6f54b047
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Jazelle state has been officially removed in Armv8.
Every AArch32 implementation must still support the
"Trivial Jazelle implementation", which means that while
the instruction set has been removed, it is still possible
for privileged software to access some Jazelle registers
like JIDR,JMCR, and JOSCR which are just treated as RAZ
Change-Id: Ie403c4f004968eb4cb45fa51067178a550726c87
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
A previous commit added BUILD_GPU guards to gpu coalescer models since
a related cache recorder commit added GPU support. This is no longer
needed since the cache recorder moved to using a vector of RubyPorts
instead of Sequencer/GPUCoalescer pointers. This commit removes
BUILD_GPU guards from the Ruby coalescer models
Change-Id: I23a7957d82524d6cd3483d22edfb35ac51796eca
Previously, the cache recorder used a vector of sequencer pointers to
access Ruby objects. A recent commit updated the cache recorder to also
maintain a vector of GPUCoalescer pointers in order for GPUs to support
flushin. This added redundant code to the cache recorder. This commit
replaces the sequencer and GPUCoalescer vectors with a vector of
RubyPort pointers so that the code does not contain redundant lines
Change-Id: Id5da33fb870f17bb9daef816cc43c0bcd70a8706
Here it's more sensible to use a GitHub hosted runner. This job is
miniscule and is used to check the other tests have completed
successfully. It makes sense for this not to be on our own self-hosted
runner.
Change-Id: I5377e025334d43eaedd0fc61e5c708ba61255d28
This is a standard compare and swap but implemented on vector memory
buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with
MUBUF's special address calculation).
This was tested using a Tensile kernel, a backend for rocBLAS, which is
used by PyTorch and Tensorflow. Prior to this patch both ML frameworks
crashed. With this patch they both make forward progress.
Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06
This instruction is used by ML frameworks to prioritize certain
wavefronts. Since gem5 does not have any support for wavefront
scheduling based on priority (besides wavefront age), we ignore this
instruction and warn_once rather than calling panic. Since hardware can
override this priority anyways, we can be sure that ignoring the value
will not inhibit forward progress resulting in application hangs.
Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc
Add a --no-kvm-perf option to disable KVM perf counters for GPUFS
scripts. This is useful for users who have KVM enabled but configured
with more restrictive settings, which seems to be the default in newer
Linux distros.
Change-Id: I7508113d0f7c74deb21ea7b2770522885a0ec822
This instruction is used by ML frameworks to prioritize certain
wavefronts. Since gem5 does not have any support for wavefront
scheduling based on priority (besides wavefront age), we ignore this
instruction and warn_once rather than calling panic. Since hardware can
override this priority anyways, we can be sure that ignoring the value
will not inhibit forward progress resulting in application hangs.
Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc
This is a standard compare and swap but implemented on vector memory
buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with
MUBUF's special address calculation).
This was tested using a Tensile kernel, a backend for rocBLAS, which is
used by PyTorch and Tensorflow. Prior to this patch both ML frameworks
crashed. With this patch they both make forward progress.
Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06
This change updates the gem5 SST Bridge to use SST 13.0.0. Changes are
made to replace SimpleMem class to StandardMem class as SimpleMem will
be deprecated in SST 14 and above. In addition, the translator.hh is
updated to translate more types of gem5 packets. A new parameter `ports`
was added on SST's side when invoking the gem5 component which does not
require recompiling the gem5 component whenever a new outgoing bridge is
added in a gem5 config.
This adds the [pyupgrade](https://github.com/asottile/pyupgrade) hook to
pre-commit.
This hook automatically upgrades the syntax to the recommended standards
for the newer version of the language.
Memory instructions acquire coalescer tokens in the schedule stage.
Currently this is only done for buffer and flat instructions, but not
flat global or flat scratch. This change now acquires tokens for flat
global and flat scratch instructions. This provides back-pressure to the
CUs and helps to avoid deadlocks in Ruby.
The change also handles returning tokens for buffer, flat global, and
flat scratch instructions. This was previously only being done for
normal flat instructions leading to deadlocks in some applications when
the tokens were exhausted.
To simplify the logic, added a needsToken() method to GPUDynInst which
return if the instruction is buffer or any flat segment.
The waitcnts were also incorrect for flat global and flat scratch. We
should always decrement vmem and exp count for stores and only normal
flat instructions should decrement lgkm. Currently vmem/exp are not
decremented for flat global and flat scratch which can lead to deadlock.
This change set fixes this by always decrementing vmem/exp and lgkm only
for normal flat instructions.
Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5
Simplify indirect predictor interface. Several of the existing
functions where merged together into four clear once. Those
four are similar to the main direction predictor interface.
'lookup', 'update', 'squash' and 'commit'. This makes the
interface much more clear, allows better functionality isolation
and makes it simpler to develop new predictor models.
A new parameter is added to allow additional buffer space for
speculative path history.
Change-Id: I6d6b43965b2986ef959953a64c428e50bc68d38e
Signed-off-by: David Schall <david.schall@ed.ac.uk>
This hook automatically upgrades the syntax to recommended standards for
new versions of the language.
These are numerous and are outlined here:
https://github.com/asottile/pyupgrade
Change-Id: I73fc58a08160ed9a21cfa3b3e023c259a84592ba
This hook detects which symlinks are changed to regular files with the
content of a path which that symlink was pointing to.
Change-Id: Ic925f02debc65c7c04e6d4cc3a25415b30858977
This verifies all Python files have a AST indicating they valid Python.
No file in the repo fails this test, so it triggered no changes.
Change-Id: Ifd7998268df6be766d92c19cfc7f1cfdf8ed103e
Now:
* The Atlas Client will attempt a connection 4 times, using an
exponential backoff approach between attempts.
* When a failure does arise a rich output is given so problems can be
easily diagnosed.
Addresses: #340
This change updates the gem5 SST Bridge to use SST 13.0.0. Changes
are made to replace SimpleMem class to StandardMem class as
SimpleMem will be deprecated in SST 14 and above. In addition, the
translator.hh is updated to translate more types of gem5 packets.
A new parameter `ports` was added on SST's side when invoking the
gem5 component which does not require recompiling the gem5
component whenever a new outgoing bridge is added in a gem5 config.
Change-Id: I45f0013bc35d088df0aa5a71951422cabab4d7f7
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
This updates the pre-commit utility from v4.3.0 to v4.5.0 and updates
black from 22.6.0 to 23.9.1.
Change-Id: I7ebb551f30e617059ce49f89a30207f739b1cb14
#371 Updates the runners. This PR updates the tests to:
1. Drop the 'run' and 'build' labels (all runners are now of the same
type).
2. Utilize the threading where possible (runners now have 4 cores
minimum).
The RISC-V vector instructions still work without setRegOperand.
We should fix the register statistic issue by
https://github.com/gem5/gem5/pull/360 to avoid duplicate statistic
register write count
Change-Id: Ib6a52935e00c3e557b366abfcf60450dca05614d
Memory instructions acquire coalescer tokens in the schedule stage.
Currently this is only done for buffer and flat instructions, but not
flat global or flat scratch. This change now acquires tokens for flat
global and flat scratch instructions. This provides back-pressure to the
CUs and helps to avoid deadlocks in Ruby.
The change also handles returning tokens for buffer, flat global, and
flat scratch instructions. This was previously only being done for
normal flat instructions leading to deadlocks in some applications when
the tokens were exhausted.
To simplify the logic, added a needsToken() method to GPUDynInst which
return if the instruction is buffer or any flat segment.
The waitcnts were also incorrect for flat global and flat scratch. We
should always decrement vmem and exp count for stores and only normal
flat instructions should decrement lgkm. Currently vmem/exp are not
decremented for flat global and flat scratch which can lead to deadlock.
This change set fixes this by always decrementing vmem/exp and lgkm only
for normal flat instructions.
Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5
Earlier, GPU checkpointing was working only if a checkpoint was created
before the first kernel execution. This pull request adds support to
checkpoint in-between any two kernel calls. It does so by doing the
following.
- Adds flush support in the GPU_VIPER protocol
- Adds flush support in the GPUCoalescer
- Updates cache recorder to use the GPUCoalescer during simulation
cooldown and cache warmup times.
The new implementation matches the table in the ARM Architecture
Reference Manual (version DDI 0487J.a, section D1.3.6, table R_SXLWJ)
It takes into consideration features like FEAT_SEL2 (scr.eel2 bit) and
FEAT_VHE (hcr.e2h bit) which affect the masking of interrupts under
certain circumstances
The new implementation matches the table in the ARM Architecture
Reference Manual (version DDI 0487J.a, section D1.3.6, table R_SXLWJ)
It takes into consideration features like FEAT_SEL2 (scr.eel2 bit) and
FEAT_VHE (hcr.e2h bit) which affect the masking of interrupts under
certain circumstances
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I07ebd8d859651475bd32fd201eea0f4e64a7dd5f
We pay a small duplication cost but we make the code
more readable and we enable further modifications to the
AArch64 code without forcing the same code on the AArch32
method
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1efa33cf19f91094fd33bd48b6a0a57d8df8f89f
This `mkdir` is problematic as it doesn't create the directory
recursively. This casues errors if `dir` is `X/Y/Z` and both `Y` and `Z`
has not been created. An error will be returned (`No such file or
directory`).
This issue was fixed with: https://github.com/gem5/gem5/pull/263. The
checkpointing code already recursively creates directories as needed.
Ergo was can remove this `mkdir` statement.
Now:
* The Atlas Client will attempt a connection 4 times, using an
exponential backoff approach between attempts.
* When a failure does arise a rich output is given so problems can be
easily diagnosed.
Change-Id: I3df332277c33a040c0ed734b9f3e28f38606af44
This PR adds two commit to handle timestamps in the ROCm runtime. ROCr
uses a mix of GPU timestamp reads and HSA packet timestamps to output
profiling information for a task dispatch.
The first patch added timestamps to the HSA completion signal indicating
when the task started and ended and require changing the flow of
completion signal DMAs to ensure the DMA of the timestamp values
completed before writing the completion signal value.
Second commit adds MMIOs for reading the GPU's timestamp counter. This
MMIO resides in the GFX MMIO space so a new class is added to handle
MMIOs in that address range.
Current [TraceCPU
documentation](https://www.gem5.org/documentation/general_docs/cpu_models/TraceCPU)
still references the deprecated **se.py/fs.py** scripts for elastic
trace generation (script paths are also outdated).
With this PR we provide a simpler Arm based elastic trace generation
script that can
be used out of the box by a user or that can be extended as needed.
This `mkdir` is problematic as it doesn't create the directory
recursively. This casues errors if `dir` is `X/Y/Z` and both `Y` and `Z`
has not been created. An error will be returned (`No such file or
directory`).
This issue was fixed with: https://github.com/gem5/gem5/pull/263. The
checkpointing code already recursively creates directories as needed.
Ergo was can remove this `mkdir` statement.
Change-Id: Ibae38267c8ee1eba76d7834367aa1c54013365bc
This automatically formats .yaml files. By deault has the following
parameters:
* `mapping`: 4 spaces.
* `sequence`: 6 spaces.
* `offset`: 4 spaces.
* `colons`: do not align top-level colons.
* `width`: None.
Change-Id: Iee5194cd57b7b162fd7e33aa6852b64c5578a3d2
Exposed in our failing compiler tests:
https://github.com/gem5/gem5/actions/runs/6348223508, this PR:
* Adds missing overrides to `PCState`'s `set` function.
* Removes `std::binary_function` from DramPower (it was deprecated in
CPP-11 and officially removed in CPP-17).
Added a parameter (_disk_device) to kernel_disk_workload which allows
users to change the disk device location. get_disk_device() now chooses
between the parameter and, if no parameter was passed, it calls a new
function _get_default_disk_device() which is implemented by each board
and has a default disk device according to each board, eg /dev/hda in
the x86_board. The previous way of setting a disk device still exists as
a default, however, with the new function users can now override this
default
This comment was left in the codebase in error. The
`set_se_binary_workload` function works fine with multi-threaded
applications. This hasn't been a restriction for some time.
This is the first PR in a series of enhancements to the BPU proposed in
#358.
However, I think putting everything into one PR is not nice to review
and prone to oversee I might did.
This PR restructures the BTB:
- A new abstract BTB class is created to enable different BTB
implementations. The new BTB class gets its own parameter and stats.
- An enum is added to differentiate branch instruction types. This enum
is used to enhance statistics and BPU management.
- The existing BTB is moved into `simple_btb` as default.
- An additional function is added to store the static instruction in the
BTB. This function is used for the decoupled front-end.
- Update configs to match new BTB parameters.
These were previously only running on single-threaded machines. Now
they'll be running on 4-core VMs so may as well run tests in parallel.
Change-Id: I7ee86512dc72851cea307dfd800dcf9a02f2f738
The new script will automatically use the newly
defined O3_ARM_v7a_3_Etrace CPU to run a simple SE simulation while
generating elastic trace files.
The script is based on starter_se.py, but contains the following
limitations:
1) No L2 cache as it might affect computational delay calculations
2) Supporting SimpleMemory only with minimal memory latency
There restrictions were imported by the existing elastic trace
generation logic in the common library (collected by grepping
elastic_trace_en) [1][2][3]
Example usage:
build/ARM/gem5.opt configs/example/arm/etrace_se.py \
--inst-trace-file [INSTRUCTION TRACE] \
--data-trace-file [DATA TRACE] \
[WORKLOAD]
[1]: https://github.com/gem5/gem5/blob/stable/\
configs/common/MemConfig.py#L191
[2]: https://github.com/gem5/gem5/blob/stable/\
configs/common/MemConfig.py#L232
[3]: https://github.com/gem5/gem5/blob/stable/\
configs/common/CacheConfig.py#L130
Change-Id: I021fc84fa101113c5c2f0737d50a930bb4750f76
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
According to the original paper [1] the elastic trace generation process
requires a cpu with a big number of entries in the ROB, LQ and SQ, so
that there are no stalls due to resource limitation.
At the moment these numbers are copy pasted from the
CpuConfig.config_etrace method [2].
[1]: https://ieeexplore.ieee.org/document/7818336
[2]: https://github.com/gem5/gem5/blob/stable/\
configs/common/CpuConfig.py#L40
Change-Id: I00fde49e5420e420a4eddb7b49de4b74360348c9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
We define a new parent (ClusterSystem) to model a system
with one or more cpu clusters within it.
The idea is to make this new base class reusable by SE
systems/scripts as well (like starter_se.py)
Change-Id: I1398d773813db565f6ad5ce62cb4c022cb12a55a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Modified the x86 KVM-in-SE syscall handler to flush the TLB following
each syscall, in case the page table has been modified. This is done by
reloading the value in %cr3. Doing this requires an intermediate GPR,
which we store in a new scratch buffer following the syscall code at
address `syscallDataBuf`.
GitHub issue: https://github.com/gem5/gem5/issues/409
- A new abstract BTB class is created to enable different BTB
implementations. The new BTB class gets its own parameter
and stats.
- An enum is added to differentiate branch instruction types.
This enum is used to enhance statistics and BPU management.
- The existing BTB is moved into `simple_btb` as default.
- An additional function is added to store the static instruction in
the BTB. This function is used for the decoupled front-end.
- Update configs to match new BTB parameters.
Change-Id: I99b29a19a1b57e59ea2b188ed7d62a8b79426529
Signed-off-by: David Schall <david.schall@ed.ac.uk>
This is still trying to completely remove any artifact
which implies virtualization is only supported in
non-secure mode (NS=1)
Change-Id: I83fed1c33cc745ecdf3c5ad60f4f356f3c58aad5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This info can be used during TLB invalidation
Change-Id: I81247e40b11745f0207178b52c47845ca1b92870
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The syscall emulation of brk() incorrectly did not ensure that newly
allocated memory was zero-initialized, which Linux guarantees and which
seems to be the expectation of glibc's malloc() and free()
implementation. This patch fixes the incorrect behavior by zero-
initalizing all memory allocations via brk().
GitHub issue: https://github.com/gem5/gem5/issues/342
Change-Id: I53cf29d6f3f83285c8e813e18c06c2e9a69d7cc2
The PR is fixing the CHI fromSequencer helper function which is making
use of the undefined tbe entry.
This has been broken by #177
Change-Id: I52feff4b5ab2faf0aa91edd6572e3e767c88e257
1. All VMs are deployable from a single Vagrantfile (per host machine).
2. Runners within VMs are now ephemeral. They cease to exist after a job
is complete. After the VM cleans the workspace and creates a new runner.
This will reduce old data, scripts, and images causing space issues on
our VMs
3. No more 'vm_manager.sh' script. The standard `vagrant` command to
manage the VMs will work.
4. Adds Copyright notices where missing.
This patch removed the bespoke "vm_manager.sh" script in favor of a
Multi-Machine Vagrantfile.
With this the users needs to only change the variables in Vagrantfile
then use the standard `vagrant` commands to launch the VMs/Runners.
Change-Id: Ida5d2701319fd844c6a5b6fa7baf0c48b67db975
Prior to this change we were limited to a root partition with only 60GB
of space which caused issues when running larger simulations (see:
https://github.com/gem5/gem5/issues/165).
There are two factors in this issue which this patch resolves:
1. The root partition in the VM was capped at 60GB despite the virtual
machines size being capped at 128GB. This resulted in libvirt giving the
VM free space it couldn't use. To fix this `lvextend` was added to the
"provision_root.sh" script to resize the root partition to fill the
available space.
2. The virtual machine size can be set via the `machine_virtual_size`
parameter. The minimum and default value is 128GB. This wasn't exposed
previously. Now, if we required, we can increase the size of the VM/Root
partition if we require (though I believe 128GB is more than sufficient
for now).
Fixes: https://github.com/gem5/gem5/issues/165
Change-Id: I82dd500d8807ee0164f92d91515729d5fbd598e3
The "action-run.sh" action replaces inline scripting in the Vagrantfile.
The major improvement is this script runs an infinite loop and
configures the runners to be ephemeral. This means they cease to exist
after a job is complete. The script then cleans the VM workspace and the
loop restarts by configuring and setting up another runner. This means
our VMs no longer accumulate files that eventually lead to the VM
running out of space.
Change-Id: Iba6dc9a480f5805042602f120fc84bdc47a96d55
There are two places self-hosted runners can exist on GitHub:
1. At the level of the repository: In this case the runners can only be
used by that repository and runners can only be distinguished from one
another by labels.
2. At the level of the organization: In this case the runners can be
used by any repository in the organization, thus increasing their
versatility. In addition to labels, runners in the level of the
organization can be organized into groups.
While we do not use our self-hosted runners on other repositories, there
may be future use for this, so we might as well enable it now.
Change-Id: Id5e113194314336221dcdc8c2858b352afcbaf6e
Having two types of GitHub Action Runners has not yielded much benefit
and caused confusion and inefficiencies. This change simplifies things
to having just one runner with 8-cores and 16GB of memory. It is
sufficient to build gem5 and run most simulations.
Change-Id: Ic49ae5e98b02086f153f4ae2a4eedd8a535786c8
Modified the x86 KVM-in-SE syscall handler to flush the TLB following
each syscall, in case the page table has been modified. This is done
by reloading the value in %cr3. Doing this requires an intermediate
GPR, which we store in a new scratch buffer following the syscall code
at address `syscallDataBuf`.
GitHub issue: https://github.com/gem5/gem5/issues/409
Change-Id: Ibc20018c97ebb1794fa31a0c71e0857d661c7c9d
gem5::MemState::updateBrkRegion(), which is called during the syscall
emulation of brk, did not unmap deallocated heap pages when the brk
region is receding. Instead, it kept it mapped for simplicity. This
introduced a bug where subequent expansions of the brk region reused
prior heap page mappings that were not zero-filled. This violates
the assumptions of glibc malloc, resulting in heap corruption and
crashes.
This patch fixes the bug by always unmapping pages that are deallocated
during a call to brk() that reduces the heap size. This makes the
gem5::MemState::_endBrkPoint field obsolete, so this patch removes it.
GitHub issue: https://github.com/gem5/gem5/issues/342
Change-Id: Ib2244e1aa4d2a26666ad60d231fdde2c22d2df35
The ROCr runtime uses a combination of HSA signal timestamps and
hardware MMIOs to calculate profiling times. At the beginning of an
application a timestamp is read from the GPU using MMIOs. The clock
MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added
to handle these MMIOs.
The timestamp value is expected to be in nanoseconds, so we simply use
the gem5 tick converted to ns.
Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4
The AMD specific HSA signal contains start/end timestamps for dispatch
packet completion signals. These are current always zero. These
timestamp values are used for profiling in the ROCr runtime.
Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check
for zero values before dividing, causing applications that use profiling
to crash with SIGFPE. Profiling is used via hipEvents in the HACC
application, so these should be supported in gem5.
In order to handle writing the timestamp values, we need to DMA the
values to memory before writing the completion signal. This changes the
flow of the async completion signal write to be (1) read mailbox pointer
(2) if valid, write the mailbox data, other skip to 4 (3) write mailbox
data if pointer is valid (4) write timestamp values (5) write completion
signal. The application will process the timestamp data as soon as the
completion signal is received, so we need to ordering to ensure the DMA
for timestamps was completed.
HACC now runs to completion on GPUFS and has the same output was
hardware.
Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e
Added a new feature to CHI protocol (in collaboration with @tiagormk).
Here is the Jira Ticket
[https://gem5.atlassian.net/browse/GEM5-1326](https://gem5.atlassian.net/browse/GEM5-1326
). As described in CHI specs, far atomic transactions enable remote
execution of Atomic Memory Operations. This pull request incorporates
several changes:
* Fix Arm ISA definition of Swap instructions. These instructions should
return an operand, so their ISA definition should be Return Operation.
* Enable AMOs in Ruby Mem Test to verify that AMOs work
* Enable near and far AMO in the Cache Controler of CHI
Three configuration parameters have been used to tune this behavior:
* policy_type: sets the atomic policy to one of the described in [our
paper](https://dl.acm.org/doi/10.1145/3579371.3589065)
* atomic_op_latency: simulates the AMO ALU operation latency
* comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
This enables two things,
- /chosen/stdout-path is now default to uart@10000000, meaning
the linux kernel's boot console will be redirected to uart.
- /chosen/bootargs now contains the boot arguments obtained from
gem5's library. This allows passing the boot arguments to the
linux kernel via the device tree.
Change-Id: I53821d85f619e6276da85f41c972c041eaaf3280
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Ruby was recently updated to support flushes and warmup for GPUs. Since
this support uses the GPUCoalescer, non-GPU builds face a compile time
issue. This is because GPU code is not built for non-GPU builds. This
commit addes "#if BUILD_GPU" guards around the GPU-related code in
common files like AbstractController.hh, CacheRecorder.*, RubySystem.cc,
GPUCoalescer.hh, and VIPERCoalescer.hh. This support allows GPU builds
to use flushing while non-GPU builds compile without problems
Change-Id: If8ee4ff881fe154553289e8c00881ee1b6e3f113
Previously, the L1, L2 number of banks and L2 latencies were not
configurable through command line arguments. This commit adds support to
configure them through the arguments '--tcp-num-banks' for number of
banks in L1, '--tcc-num-banks' for number of banks in L2, and
'--tcc-tag-access-latency', and '--tcc-data-access-latency'
Change-Id: Ie3b713ead16865fd7120e2d809ebfa56b69bc4a1
ROCm supports dynamically allocating scratch space, which resides in
framebuffer memory, to reduce the amount of memory allocated for kernels
that have not yet launched. The size of the scratch space allocated is
located in task->amdQueue.compute_tmpring_size_wavesize. This size is in
kilobytes. The AQL task contains the number of bytes requested *per work
item*, however we currently check if there is enough tmpring space by
comparing a single work item. This should instead check the size *per
wavefront*.
This causes problems in applications where multiple kernels use dynamic
scratch allocation and a later kernel requires more space than the
earlier kernel. The only application being tested that does this is
LULESH. This was resulting in the scratch space being too small,
resulting in workgroups clobbering each other's private memory leading
to some nasty bugs. It is fixed by this patch as task->amdQueue will be
re-read from the host and will contain the updated tmpring size. After
this there is enough scratch space and LULESH makes forward progress.
This comment was left in the codebase in error. The
`set_se_binary_workload` function works fine with multi-threaded
applications. This hasn't been a restriction for some time.
Change-Id: I1b1d27c86f8d9284659f62ae27d752bf5325e31b
`std::binary_function` was deprecated in C++11 and officially
removed in CPP-17.
This caused a compilation error on some systems. Fortunately it can be
safely removed. It was unecessary. The commandItemSorter was compliant
witih the `sort` regardless.
Change-Id: I0d910e50c51cce2545dd89f618c99aef0fe8ab79
As highlighed in this failing compiler test:
https://github.com/gem5/gem5/actions/runs/6348223508/job/17389057995
Clang was failing when compiling "build/ALL/gem5.opt" due missing
overrides in `PCState`'s "set" function.
This was observed in Clang-14 and, stangely, Clang-8.
Change-Id: I240c1087e8875fd07630e467e7452c62a5d14d5b
Introduce far atomic operations in CHI protocol.
Three configuration parameters have been used to tune this behavior:
policy_type: sets the atomic policy to one of the described in our paper
atomic_op_latency: simulates the AMO ALU operation latency
comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
Change-Id: I087afad9ad9fcb9df42d72893c9e32ad5a5eb478
Swap instructions are configured as non returning AMO operations. This is wrong because they
return the previous value stored in the target memory position
Change-Id: I84d75a571a8eaeaee0dbfac344f7b34c72b47d53
This introduces the changes necessary for clang-15 and clang-16 to run
within gem5, and adds them to the compiler tests.
This also updates the dockerfiles for ubuntu 22.04 to include the steps
necessary to compile clang-15 and clang-16.
ROCm supports dynamically allocating scratch space, which resides in
framebuffer memory, to reduce the amount of memory allocated for kernels
that have not yet launched. The size of the scratch space allocated is
located in task->amdQueue.compute_tmpring_size_wavesize. This size is in
kilobytes. The AQL task contains the number of bytes requested *per work
item*, however we currently check if there is enough tmpring space by
comparing a single work item. This should instead check the size *per
wavefront*.
This causes problems in applications where multiple kernels use dynamic
scratch allocation and a later kernel requires more space than the
earlier kernel. The only application being tested that does this is
LULESH. This was resulting in the scratch space being too small,
resulting in workgroups clobbering each other's private memory leading
to some nasty bugs. It is fixed by this patch as task->amdQueue will be
re-read from the host and will contain the updated tmpring size. After
this there is enough scratch space and LULESH makes forward progress.
Change-Id: Ie9e0f92bb98fd3c3d6c2da3db9ee65352f9ae070
Add the instruction size of a static instruction. x86 and arm decoders
add now the instruction size to the macro instruction. However, microops
are still handled by the fetch stage which is not nice.
Furthermore, we add a set method to the PC state. It allows setting a PC
state to acertain address.
Both methods are required for the decoupled front-end.
Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
GPUFS+KVM simulations automatically enable AVX. This commit adds a
command line option to disable AVX if its not needed for a GPUFS
simulation.
Change-Id: Ic22592767dbdca86f3718eca9c837a8e29b6b781
Previously, the L1, L2 number of banks and L2 latencies were not
configurable through command line arguments. This commit adds support to
configure them through the arguments '--tcp-num-banks' for number of
banks in L1, '--tcc-num-banks' for number of banks in L2, and
'--tcc-tag-access-latency', and '--tcc-data-access-latency'
Change-Id: Ie3b713ead16865fd7120e2d809ebfa56b69bc4a1
During checkpoint restoration, the unserialize() function writes rptr,
wptr, and indirect buffer rptr, wptr to PM4 queue's rptr, wptr fields.
This commit updates this to write only the relevant pointers to the
queue structure. If indirect buffers are used, then it writes only the
indirect buffer pointers to the queue. If they are not used, then it
writes rptr, wptr values to the queue.
Change-Id: Iedb25a726112e1af99cc1e7bc012de51c4ebfd45
Previously, the cache recorder used the Sequencer to issue flush
requests and cache warmup requests. The GPU however uses GPUCoalescer to
access the cache, and not the Sequencer. This commit adds a GPUCoalescer
map to the cache recorder and uses it to send flushes and cache warmup
requests to any GPU caches in the system
Change-Id: I10490cf5e561c8559a98d4eb0550c62eefe769c9
This commit adds flush support to the GPU VIPER coherence protocol. The
L1 cache will now initiate a flush request if the packet it receives
is of type RubyRequestType_FLUSH. During the flush process, the L1 cache
will a request to L2 if its in either V or I state. L2 will issue a
flush request to the directory if its cache line is in the valid
state before invalidating its copy. The directory, on receiving this
request, writes data to memory and sends an ack back to the L2. L2
forwards this ack back to the L1, which then ends the flush by calling
the write callback
Change-Id: I9dfc0c7b71a1e9f6d5e9e6ed4977c1e6a3b5ba46
The GPU Coalescer does not contain cache cooldown and warmup support.
This commit updates the coalsecer to support cache cooldown during flush
and warmup during checkpoint restore.
Change-Id: I5459471dec20ff304fd5954af1079a7486ee860a
GPUFS uses aql information from PM4 queues to initialize doorbells. This
commit adds aql information to the checkpoint so that it can be used
during restoration to correctly initialize all doorbells. Additionally,
this commit also sets the hsa queue correctly during checkpoint-restoration
Change-Id: Ief3ef6dc973f70f27255234872a12c396df05d89
I believe the point of this binary was to allow people to use the m5
objects without the entire gem5 binary. However, without adding the
importer call, this did not work. Unfortunately, with the importer call
there is a circular dependence on the original gem5py.cc file.
Therefore, this change creates a new file that has the importer call.
Now, with the `gem5py_m5` binary you can run python code that references
modules in `src/python`. Note that `_m5` is not available, so anything
that depends on the gem5 SimObjects' implementation will not work.
However, this can still be useful for things like getting Resources,
processing stats, etc.
Adds the instruction size to all static instruction. x86, arm
and RISC-V decoders add the instruction size to every decoded
macro instruction. As microops should reflect the size of the
their parent macroop the set method is overwritten to pass the
size to all microops.
Furthermore, we add a set method to the PC state. It allows
setting a PC state to a certain address.
Both methods are required for the decoupled front-end.
Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
Signed-off-by: David Schall <david.schall@ed.ac.uk>
* Removes `+` symbol accidently left in (this broke building).
* Removes `ARG` thus making the Docker exclusively for clang-16.
* Adds "llvm.sh" to the repo. This stops us being dependent on the url
download. The script is under the apache license therefore compatible.
* Merges several `apt install` commands into one.
Change-Id: Iaf411656aac83f67f5395b20efd96ecc1eabb263
The L3 cache did not work due to argument type mismatch in the call to
the constructor `DMAController`. The second argument is expecting a
`RubySystem` type but the code passes in a `cache_line_size` variable.
After I change the second argument to `self.ruby_system` everything
works.
There are two overloaded-virtual issues reported by g++13.
1. Copy assignment and move assignment overload is hidden in the derived
class
[ CXX] src/mem/cache/replacement_policies/weighted_lru_rp.cc ->
ALL/mem/cache/replacement_policies/weighted_lru_rp.o
In file included from src/mem/cache/base.hh:61,
from src/mem/cache/base.cc:46:
src/mem/cache/cache_blk.hh:172:5: error: ‘virtual gem5::CacheBlk&
gem5::CacheBlk::operator=(gem5::CacheBlk&&)’ was hidden
[-Werror=overloaded-virtual=]
172 | operator=(CacheBlk&& other)
| ^~~~~~~~
src/mem/cache/cache_blk.hh:518:19: note: by ‘gem5::TempCacheBlk&
gem5::TempCacheBlk::operator=(const gem5::TempCacheBlk&)’
518 | TempCacheBlk& operator=(const TempCacheBlk&) = delete;
| ^~~~~~~~
In this case, we can exiplict using parent operator= to keep the
function overload.
2. Intended overload hidden in SystemC is reported as error.
In file included from
src/systemc/ext/tlm_utils/simple_initiator_socket.h:24,
from src/systemc/tlm_bridge/gem5_to_tlm.hh:72,
from build/ALL/python/_m5/param_Gem5ToTlmBridge256.cc:17:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh: In
instantiation of ‘class tlm::tlm_base_initiator_socket<256,
tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1,
sc_core::SC_ONE_OR_MORE_BOUND>’:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:185:7:
required from ‘class tlm::tlm_initiator_socket<256,
tlm::tlm_base_protocol_types, 1, sc_core::SC_ONE_OR_MORE_BOUND>’
src/systemc/ext/tlm_utils/simple_initiator_socket.h:37:7: required from
‘class
tlm_utils::simple_initiator_socket_b<sc_gem5::Gem5ToTlmBridge<256>, 256,
tlm::tlm_base_protocol_types, sc_core::SC_ONE_OR_MORE_BOUND>’
src/systemc/ext/tlm_utils/simple_initiator_socket.h:156:7: required from
‘class tlm_utils::simple_initiator_socket<sc_gem5::Gem5ToTlmBridge<256>,
256, tlm::tlm_base_protocol_types>’
src/systemc/tlm_bridge/gem5_to_tlm.hh:147:46: required from ‘class
sc_gem5::Gem5ToTlmBridge<256>’
/usr/include/c++/13/type_traits:1411:38: required from ‘struct
std::is_base_of<sc_gem5::Gem5ToTlmBridgeBase,
sc_gem5::Gem5ToTlmBridge<256> >’
ext/pybind11/include/pybind11/detail/../detail/common.h:880:59: required
from ‘struct pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>,
sc_gem5::Gem5ToTlmBridgeBase,
std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete>
>::is_valid_class_option<sc_gem5::Gem5ToTlmBridgeBase>’
ext/pybind11/include/pybind11/detail/../detail/common.h:719:35: required
by substitution of ‘template<class ... Ts> using
pybind11::detail::all_of = pybind11::detail::bool_constant<(Ts::value &&
...)> [with Ts = {pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>,
sc_gem5::Gem5ToTlmBridgeBase,
std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete>
>::is_valid_class_option<sc_gem5::Gem5ToTlmBridgeBase>,
pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>,
sc_gem5::Gem5ToTlmBridgeBase,
std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete>
>::is_valid_class_option<std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>,
pybind11::nodelete> >}]’
ext/pybind11/include/pybind11/pybind11.h:1506:70: required from ‘class
pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>,
sc_gem5::Gem5ToTlmBridgeBase,
std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >’
build/ALL/python/_m5/param_Gem5ToTlmBridge256.cc:34:179: required from
here
src/systemc/ext/tlm_utils/../core/sc_port.hh:125:18: error: ‘void
sc_core::sc_port_b<IF>::bind(sc_core::sc_port_b<IF>&) [with IF =
tlm::tlm_fw_transport_if<>]’ was hidden [-Werror=overloaded-virtual=]
125 | virtual void bind(sc_port_b<IF> &p) { sc_port_base::bind(p); }
| ^~~~
In file included from
src/systemc/ext/tlm_utils/simple_initiator_socket.h:27:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:133:18:
note: by ‘tlm::tlm_base_initiator_socket<256,
tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1,
sc_core::SC_ONE_OR_MORE_BOUND>::bind’
133 | virtual void bind(bw_interface_type &ifs) {
(get_base_export())(ifs); }
| ^~~~
src/systemc/ext/tlm_utils/../core/sc_port.hh:124:18: error: ‘void
sc_core::sc_port_b<IF>::bind(IF&) [with IF =
tlm::tlm_fw_transport_if<>]’ was hidden [-Werror=overloaded-virtual=]
124 | virtual void bind(IF &i) { sc_port_base::bind(i); }
| ^~~~
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:133:18:
note: by ‘tlm::tlm_base_initiator_socket<256,
tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1,
sc_core::SC_ONE_OR_MORE_BOUND>::bind’
133 | virtual void bind(bw_interface_type &ifs) {
(get_base_export())(ifs); }
| ^~~~
From the code comment, it's intended in SystemC header.
// The overloaded virtual is intended in SystemC, so we'll disable the
warning. // Please check section 9.3 of SystemC 2.3.1 release note for
more details.
The issue is we should move the skip to the base class.
This patch introduces a new category called "suite".
A suite is a collection of workloads.
Each workload in a SuiteResource has a tag that can be narrowed down
through the function with_input_group.
Also, the set of input groups can be seen through list_input_groups.
Added unit tests to test all functions of SuiteResource class.
Change-Id: Iddda5c898b32b7cd874987dbe694ac09aa231f08
Co-authored-by: Kunal Pai <kunpai@ucdavis.edu>
This problem is similar to the problem described in [1]. This problem
produces symptoms as described in [2].
In short, the Linux kernel relies on the CSR_STATUS's FS bits to decide
whether to save the floating point registers. If the FS bits are set to
DIRTY, the floating point registers will be saved during context
switching / task switching.
Currently, with the patch in [1], we only change the FS bits upon every
floating arithmetic instruction. However, since floating load
instructions also mutate the state of floating point registers, the FS
bits should be updated to DIRTY.
The problem in [2] arose when the program populates the content of one
floating register to an array by repeatedly using `fld fa5, EA`. A
context switch occured upon a page fault, and while handling that page
fault, the kernel might have to handle an interrupt. This caused the
kernel to task switch between handling page fault and handling
interrupt. This caused __switch_to() to be called, which will save the
floating point registers only if the SD (indirectly set by FS) bits are
set to DIRTY, while restoring the floating point registers to the
switch-to task [3]. This caused the floating point registers to be
zeroed out when it was restored as it was never saved before.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/issues/349
[3]
https://github.com/torvalds/linux/blob/v6.5/arch/riscv/include/asm/switch_to.h#L56
This updates the dockerfiles for ubuntu 22.04 to include the
steps necessary to compile clang-15 and clang-16.
Change-Id: I2bba6393ab93a6ce05a2c3ce31f3bbc71bcdca7c
This problem is similar to the problem described in [1].
This problem produces symptoms as described in [2].
In short, the Linux kernel relies on the CSR_STATUS's FS bits
to decide whether to save the floating point registers. If
the FS bits are set to DIRTY, the floating point registers will
be saved during context switching / task switching.
Currently, with the patch in [1], we only change the FS bits
upon every floating arithmetic instruction. However, since
floating load instructions also mutate the state of floating
point registers, the FS bits should be updated to DIRTY.
The problem in [2] arose when the program populates the content
of one floating register to an array by repeatedly using
`fld fa5, EA`. A context switch occured upon a page fault, and
while handling that page fault, the kernel might have to handle
an interrupt. This caused the kernel to task switch between
handling page fault and handling interrupt. This caused
__switch_to() to be called, which will save the floating point
registers only if the SD (indirectly set by FS) bits are set to
DIRTY, while restoring the floating point registers to the
switch-to task [3]. This caused the floating point registers to
be zeroed out when it was restored as it was never saved before.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/issues/349
[3] https://github.com/torvalds/linux/blob/v6.5/arch/riscv/include/asm/switch_to.h#L56
Change-Id: Ia5656da5a589a8e29fb699d2ee12885b8f3fa2d2
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
I believe the point of this binary was to allow people to use the m5
objects without the entire gem5 binary. However, without adding the
importer call, this did not work. Unfortunately, with the importer call
there is a circular dependence on the original gem5py.cc file.
Therefore, this change creates a new file that has the importer call.
Now, with the `gem5py_m5` binary you can run python code that references
modules in `src/python`. Note that `_m5` is not available, so anything
that depends on the gem5 SimObjects' implementation will not work.
However, thic can still be useful for things like getting Resources,
processing stats, etc.
Change-Id: I5c0e5d1a669fe5ce491458df916f2049c81292eb
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
By default, the `--stderr-file` and `--stdout-file` arguments were
directing the simulator to output files named "simerr" and "simout"
respectively if an output redirect was requested.
A small annoyance is these files lack an extension meaning programs
refuse to open them, or don't do so withou additional effort. On many
systems they are assumed to scripts.
This patch adds the `.txt` extension to both, thus clearly indicating to
other programs these are text files and can be opened and read as such.
Currently we drop the insertion of a whole symbol table if the name of
one symbol already exists in the base table. Having similar symbols
across different binaries is common.
This change adds a warning and recommends a fix instead of silently
dropping the table.
This PR utilizes GitHub Action's matrix's to automatically distribute
the CI testlib gem5 build and test jobs across available GitHub Action
Runners.
The CI tests (the `quick` testlib tests, i.e. those run with `./main.py
run`) are distributed across the runners on a per directory basis ---
all directories under "tests/gem5" are run as their own jobs.
The necessary gem5 builds for each workflow are now automatically
inferred via the introduction of `./main.py list`'s `--build-targets`
flag which returns the gem5 build target for a given test or collection
of tests. E.g., `./main.py list --build-targets` will return the build
targets for all the `quick` testlib tests and `./main.py list
--build-target --uid=<id>` will return the build targets the test suite
`<id>` requires.
Moving from monolithic jobs to fine-grained ones will make the locaiton
of test failures more obvious. Each job has it's own artifact containing
"test/testing-results" for the tests run in that job. In addition,
maintenance of these files should become less burdensome due to less
hardcoding.
As discussed here, [1], O3CPU counts getWritableRegOperand() as a reg
read, while SimpleCPU variants count getWriableRegOperand() as a reg
write.
This patch fixes this inconsistency. Here, I assume that if
getWritableRegOperand() is used, setReg() will not be used again to
write to the destination register.
[1] https://github.com/gem5/gem5/pull/341
The auxv platform string was not copied to the same location that was
pointed to by the value of AT_PLATFORM; instead, it was copied over the
auxv random buffer. This patch fixes this by copying the auxv platform
string to the right offset in the initial program stack.
GitHub issue: https://github.com/gem5/gem5/issues/346
The popx87 micro-op did not in fact pop the st(0) floating-point
register off the stack; it acted as a no-op. This patch fixes the bug by
passing the spm=1 argument to PopX87's superclass to indicate the
floating-point stack pointer should be incremented.
GitHub issue: https://github.com/gem5/gem5/issues/344
- Added mulitline string for print message
- Added get_category_name method instead of having category as variable
Change-Id: I51e0e14a70e802453c21070711b200bc47994ba3
This introduces the changes necessary for clang-15 and clang-16
to run within gem5, and adds them to the compiler tests.
Change-Id: If809eae1bd8c366b4d62476891feff0625bdf210
There are two overloaded-virtual issues reported by g++13.
1. Copy assignment and move assignment overload is hidden in the derived
class
[ CXX] src/mem/cache/replacement_policies/weighted_lru_rp.cc -> ALL/mem/cache/replacement_policies/weighted_lru_rp.o
In file included from src/mem/cache/base.hh:61,
from src/mem/cache/base.cc:46:
src/mem/cache/cache_blk.hh:172:5: error: ‘virtual gem5::CacheBlk& gem5::CacheBlk::operator=(gem5::CacheBlk&&)’ was hidden [-Werror=overloaded-virtual=]
172 | operator=(CacheBlk&& other)
| ^~~~~~~~
src/mem/cache/cache_blk.hh:518:19: note: by ‘gem5::TempCacheBlk& gem5::TempCacheBlk::operator=(const gem5::TempCacheBlk&)’
518 | TempCacheBlk& operator=(const TempCacheBlk&) = delete;
| ^~~~~~~~
In this case, we can exiplict using parent operator= to keep the
function overload.
2. Intended overload hidden in SystemC is reported as error.
In file included from src/systemc/ext/tlm_utils/simple_initiator_socket.h:24,
from src/systemc/tlm_bridge/gem5_to_tlm.hh:72,
from build/ALL/python/_m5/param_Gem5ToTlmBridge256.cc:17:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh: In instantiation of ‘class tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>’:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:185:7: required from ‘class tlm::tlm_initiator_socket<256, tlm::tlm_base_protocol_types, 1, sc_core::SC_ONE_OR_MORE_BOUND>’
src/systemc/ext/tlm_utils/simple_initiator_socket.h:37:7: required from ‘class tlm_utils::simple_initiator_socket_b<sc_gem5::Gem5ToTlmBridge<256>, 256, tlm::tlm_base_protocol_types, sc_core::SC_ONE_OR_MORE_BOUND>’
src/systemc/ext/tlm_utils/simple_initiator_socket.h:156:7: required from ‘class tlm_utils::simple_initiator_socket<sc_gem5::Gem5ToTlmBridge<256>, 256, tlm::tlm_base_protocol_types>’
src/systemc/tlm_bridge/gem5_to_tlm.hh:147:46: required from ‘class sc_gem5::Gem5ToTlmBridge<256>’
/usr/include/c++/13/type_traits:1411:38: required from ‘struct std::is_base_of<sc_gem5::Gem5ToTlmBridgeBase, sc_gem5::Gem5ToTlmBridge<256> >’
ext/pybind11/include/pybind11/detail/../detail/common.h:880:59: required from ‘struct pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<sc_gem5::Gem5ToTlmBridgeBase>’
ext/pybind11/include/pybind11/detail/../detail/common.h:719:35: required by substitution of ‘template<class ... Ts> using pybind11::detail::all_of = pybind11::detail::bool_constant<(Ts::value && ...)> [with Ts = {pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<sc_gem5::Gem5ToTlmBridgeBase>, pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >}]’
ext/pybind11/include/pybind11/pybind11.h:1506:70: required from ‘class pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >’
build/ALL/python/_m5/param_Gem5ToTlmBridge256.cc:34:179: required from here
src/systemc/ext/tlm_utils/../core/sc_port.hh:125:18: error: ‘void sc_core::sc_port_b<IF>::bind(sc_core::sc_port_b<IF>&) [with IF = tlm::tlm_fw_transport_if<>]’ was hidden [-Werror=overloaded-virtual=]
125 | virtual void bind(sc_port_b<IF> &p) { sc_port_base::bind(p); }
| ^~~~
In file included from src/systemc/ext/tlm_utils/simple_initiator_socket.h:27:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:133:18: note: by ‘tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>::bind’
133 | virtual void bind(bw_interface_type &ifs) { (get_base_export())(ifs); }
| ^~~~
src/systemc/ext/tlm_utils/../core/sc_port.hh:124:18: error: ‘void sc_core::sc_port_b<IF>::bind(IF&) [with IF = tlm::tlm_fw_transport_if<>]’ was hidden [-Werror=overloaded-virtual=]
124 | virtual void bind(IF &i) { sc_port_base::bind(i); }
| ^~~~
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:133:18: note: by ‘tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>::bind’
133 | virtual void bind(bw_interface_type &ifs) { (get_base_export())(ifs); }
| ^~~~
From the code comment, it's intended in SystemC header.
// The overloaded virtual is intended in SystemC, so we'll disable the warning.
// Please check section 9.3 of SystemC 2.3.1 release note for more details.
The issue is we should move the skip to the base class.
Change-Id: I6683919e594ffe1fb3b87ccca1602bffdb788e7d
Adds a new probe listener template which can be used to instantiate with
a lambda function that is called by notify(). It is similar to
ProbeListenerArg with class but provides more flexibility. I.e. the can
be another object than the one instantiating the lambda which allows to
listen to any object. Furthermore additional parameters can be passed in
easily.
Change-Id: Iba451357182caf25097b9ae201cd5c647aff3a4f
With this PR our CHI implementation starts making use of the txnid and
DBID identifiers.
Note: we were already making use of the txnId for DVM messages to convey
the DVM address. This is still the case.
In the future we should realign the DVM logic so that the txnId is
solely used as a transaction identifier.
Current we drop the insertion of a whole symbol table if the name
of one symbol already exists in the base table. Having similar
symbols across different binaries is very common.
This change adds a warning and recommends a fix instead of silently
dropping the table. This is useful for debugging when there are two
or more workloads, e.g. bootloader + kernel, are added separately.
Change-Id: I9e4cf06037cd70926fb5cee3c4dab464daf0912e
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
As discussed here, [1], O3CPU counts getWritableRegOperand() as a
reg read, while SimpleCPU variants count getWriableRegOperand()
as a reg write.
This patch fixes this inconsistency. Here, I assume that if
getWritableRegOperand() is used, setReg() will not be used again
to write to the destination register.
[1] https://github.com/gem5/gem5/pull/341
Change-Id: If00049eb598f6722285e9e09419aef98ceed759f
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Adds a new probe listener template which can be used
to instantiate with a lambda function that is called by
notify(). It is similar to ProbeListenerArg with class but
provides more flexibility. I.e. the can be another object
than the one instantiating the lambda which allows to listen
to any object. Furthermore additional parameters can be
passed in easily.
Change-Id: Iba451357182caf25097b9ae201cd5c647aff3a4f
Signed-off-by: David Schall <david.schall@ed.ac.uk>
At the moment the instruction is disassembled as an integer
operation:
msrNZCV x547, x0
Instead of
msr nzcv x0
Change-Id: I3f6576dccbe86db401c73747750ca3cfdf4055d5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The change will allow developers to implement and decode their
non-standard instructions to the CPU models
Bug: 289467440
Test: None
Change-Id: I67f4abc71596f819c1265e325784f51c8e9bb359
By default, the --stderr-file and --stdout-file arguments were
directing the simulator output to files named "simerr" and
"simout" respectively if an output redirect was requested.
A small annoyance is these files lack an extension meaning programs
refuse to open them, or to do so without some additional effort. On
many systems they are assumed to scripts.
This patch adds the .txt extension to both, thus clearly indicating
to other programs these are text files and can be opened to be read
as such.
Change-Id: Iff5af4a9e6966b4467d005a029dbf401099fbd35
This allows us to generate stubs for the modules in gem5. The output
will be a "typings" directory which can be used by Pylance (Python
IntelliSense) to infer typings in Visual Studio Code.
Note: A "typings" directory in the root of the workspace is the default
location for Pylance to look for typings. This can be changed via
`python.analysis.stubPath` in "settings.json".
Usage
=====
```
pip3 install -r requirements.txt
scons build/ALL/gem5.opt -j$(nproc)
./build/ALL/gem5.opt util/gem5-stubgen.py
```
The auxv platform string was not copied to the same location that was
pointed to by the value of AT_PLATFORM; instead, it was copied over
the auxv random buffer. This patch fixes this by copying the auxv
platform string to the right offset in the initial program stack.
GitHub issue: https://github.com/gem5/gem5/issues/346
Change-Id: Ied4b660d5fc444a94acb97b799be0a3722438b5e
The popx87 micro-op did not in fact pop the st(0) floating-point
register off the stack; it acted as a no-op. This patch fixes the bug
by passing the spm=1 argument to PopX87's superclass to indicate the
floating-point stack pointer should be incremented.
GitHub issue: https://github.com/gem5/gem5/issues/344
Change-Id: I6e731882b6bcf8f0e06ebd2f66f673bf9da80717
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID. However, it
may cause wrong instruction flags set for jal because the section
"handle the 'Jalr' instruction" misses the opcode checking. The PR fix
the issue to ensure the IsReturn can be only set in Jalr.
It is possible to execute a GPU atomic instruction using a memory
address that is in the host memory space (e.g, HMM, __managed__,
hipHostMalloc'd address). Since these are in host memory they are passed
to the SystemHub DmaDevice. However, this currently executes as a write
packet without modifying data. This leads to hangs in applications that
use atomics for forward progress (e.g., HeteroSync).
It is not clear where these are handled on a real GPU, but they are
certainly not handled by the software stack nor driver, so they must be
handled in hardware and therefore implemented in gem5. Handling for
atomics in the SystemHub makes the most sense.
To make atomics work a few extra changes need to be made to the
SystemHub. (1) The atomic is implemented as a host memory read, followed
by calling the AtomicOpFunctor, followed by a write. This requires a
second event to handle read response, performing atomic, and issuing a
write. (2) Atomics must be serialized otherwise two atomics might return
the same value which is incorrect. This patch adds serialization logic
for all request types to the same address to handle this. (3) With the
added complexity of the SystemHub, a new debug flag explicitly for
SystemHub is added.
Testing done: The heterosync application with input "sleepMutex 10 16 4"
previously hung before this patch. It passes with the patch applied.
This application tests both (1) and (2) above, as it allocates locks
with hipHostMalloc and has multiple workgroups sending an atomic request
in the same Tick, verifying the serialization mechanism.
The implementation of the x86 PACK micro-op had a logical bug that
caused the `PACKSSWB` and `PACKSSDW` instructions to produce incorrect
results. Specifically, due to a signedness error, the overflow check for
negative integers being packed always evaluated to true, resulting in
all negative integers being packed as -1 in the output.
This patch fixes the signedness error that causes the bug.
GitHub issue: https://github.com/gem5/gem5/issues/331
It is possible to execute a GPU atomic instruction using a memory
address that is in the host memory space (e.g, HMM, __managed__,
hipHostMalloc'd address). Since these are in host memory they are passed
to the SystemHub DmaDevice. However, this currently executes as a write
packet without modifying data. This leads to hangs in applications that
use atomics for forward progress (e.g., HeteroSync).
It is not clear where these are handled on a real GPU, but they are
certianly not handled by the software stack nor driver, so they must be
handled in hardware and therefore implemented in gem5. Handling for
atomics in the SystemHub makes the most sense.
To make atomics work a few extra changes need to be made to the
SystemHub. (1) The atomic is implemented as a host memory read, followed
by calling the AtomicOpFunctor, followed by a write. This requires a
second event to handle read response, performing atomic, and issuing a
write. (2) Atomics must be serialized otherwise two atomics might return
the same value which is incorrect. This patch adds serialization logic
for all request types to the same address to handle this. (3) With the
added complexity of the SystemHub, a new debug flag explicitly for
SystemHub is added.
Testing done: The heterosync application with input "sleepMutex 10 16 4"
previously hung before this patch. It passes with the patch applied.
This application tests both (1) and (2) above, as it allocates locks
with hipHostMalloc and has multiple workgroups sending an atomic request
in the same Tick, verifying the serialization mechanism.
Change-Id: Ife84b30037d1447dd384340cfeb06fdfd472fff9
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID.
However, it may cause wrong instruction flags set for jal because
the section "handle the 'Jalr' instruction" misses the opcode
checking. The PR fix the issue to ensure the IsReturn can be only
set in Jalr.
Change-Id: I9ad867a389256f9253988552e6567d2b505a6901
The implementation of the x86 PACK micro-op had a logical bug that
caused the `PACKSSWB` and `PACKSSDW` instructions to produce
incorrect results. Specifically, due to a signedness error, the
overflow check for negative integers being packed always evaluated
to true, resulting in all negative integers being packed as -1 in
the output.
This patch fixes the signedness error that causes the bug.
GitHub issue: https://github.com/gem5/gem5/issues/331
Change-Id: I44b7328a8ce31742a3c0dfaebd747f81751e8851
This splits the CI Tests to one job per sub-directory in "tests/gem5"
via a matrix.
Advantages:
* We can utilize more runners to run the quick tests. This should mean
tests run quicker.
* This approach does not require editing of the workflow as more tests
are added or taken away.
* There is now an output artifact for each directory in "tests/gem5"
instead of one for the entriety of every quick test in "tests".
In addition:
* The artifact retention for the test outputs has been increased to 30 days.
* The output test artifacts have been renamed to be more descriptive of
the job, run, attempt, directory run, and the status.
* The 'tar' step has been removed. GitHub's 'action/artifact' can handle
directories.
Change-Id: I5b3132b424e3769d81d9cd75db2a8c59dbe4a7e5
While there was code present in "serialize.cc" to create the checkpoint
directory, it did not do recursively. This patch ensures all the
directories are created in a path to the checkpoint directory.
Change-Id: Ibcf7f800358fd89946f550b8cfb0cef8b51fceac
This allows for build target information (i.e., the gem5 binary to be
built for the tests) to be returned.
Change-Id: I6638b54cbb1822555f58e74938d36043c11108ba
Now the "tests/main.py" script will accept the `--fixtures` flag when
using the `list` command. This will only list the fixtures needed.
To have this implemented `__str__` for the `Fixture` class has been
implemented.
Change-Id: I4bba26e923c8b0001163726637f2e48c801e92b1
Using just a print was causing this warning to print even with the `-q`
flag was passed. The `-q` flag sets the output to machine readable,
which the warning statement is not.
Change-Id: I139e2565dbc53aaee9027c0e003d34ba800a7ef4
While it makes sense to define the cache_line_size as a 32-bit unsigned int,
the use of cache_line_size is way out of its original scope.
cache_line_size has been used to produce an address mask, which masking out
the offset bits from an address. For example, [1], [2], [3], and [4].
However, since the cache_line_size is an "unsigned int", the type of the
value is not guaranteed to be 64-bit long. Subsequently, the
bit twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask,
i.e., 0x00000000FFFFFFC0.
This behavior at least caused a problem in LLSC in RISC-V [5], where the
load reservation (LR) relies on the mask to produce the cache block address.
Two distinct 64-bit addresses can be mapped to the same cache block using
the above mask.
This patch explicitly defines cache_line_size as a 64-bit unsigned int so
the cache block mask can be produced correctly for 64-bit addresses.
[1] 3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)
[2] 3bdcfd6f7a/src/cpu/simple/timing.hh (L224)
[3] 3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)
[4] 3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)
[5] 3bdcfd6f7a/src/arch/riscv/isa.cc (L787)
Change-Id: I29abc7aaab266a37326846bbf7a82219071c4ffe
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
When there is race between FwdGetX
and PUTX on owner. Owner in this case hands off
ownership to GetX requestor and PUTX still goes
through. But since owner has changed, state should go back to M and PUTX
is essentially trashed.
An Unblock to the Directory in this case will give an undefined
transition. I have added transitions which indicate that when an Unblock
is served to the Directory, it means that some kind of ownership
transfer has happened while a PUTX/PUTO was in progress.
- Implemented a __str__ for AbstractResource
__str__ prints resource category, id and version.
link to resources website is also printed.
Change-Id: Iad5825ff7d8d505ceb236e00dc49bb56055fc8f0
This allows users to obtain resources via the CLI instead of having to
write a python script to do so. It is essentially a nice CLI wrapper for
"gem5.resources.resource.obtain_resource"
## Usage
```sh
> scons build/ALL/gem5.opt -j `nproc`
> ./build/ALL/gem5.opt util/obtain-resource.py --help
usage: obtain-resource.py [-h] [-p PATH] [-q] id
positional arguments:
id The resource id to download.
options:
-h, --help show this help message and exit
-p PATH, --path PATH The path the resource is to be downloaded to. If not specified, the resource will be downloaded to the default
location in the gem5 local cache of resources
-q, --quiet Suppress output.
```
E.g.:
```sh
./build/ALL/gem5.opt util/obtain-resource.py arm-hello64-static -p arm-hello
```
Will download the resource with ID `arm-hello64-static` to `arm-hello`
in the CWD.
This allows for a user to specify the exact path they want a resource to
be downloaded to. This differs from 'resource_direcctory' in that a user
may specify the file/directory name of the resource (using just the
'resource_directory' will have the resource as its ID in that directory.
Change-Id: I887be6216c7607c22e49cf38226a5e4600f39057
Via this workflow we now can build and push our docker images to the
GitHub Docker container registry:
26a1ee4e61/.github/workflows/docker-build.yaml
GitHub does not charge for downloads to runners (hosted or self-hosted).
This can therefore save the project money if we download from GitHub's
Docker reigstry over Google Cloud's.
This is a test to ensure this works as intended.
The long/daily tests in "tests/gem5/gem5_library_tests" were running in
both the "testlib-long-tests" and the
"testlib-long-gem5_library_example_tests" job in the Daily tests
Workflow. The running in "testlib-long-tests" is removed in this PR.
This change, https://github.com/gem5/gem5/pull/205, mistakenly allocates
write buffer for clflush instruction when there's a cache miss. However,
clflush in gem5 is not a write instruction. Thus, the cache should
allocate miss buffer in this case.
Via this workflow we now can build and push our docker images to
the GitHub Docker container registry:
26a1ee4e61/.github/workflows/docker-build.yaml
GitHub does not charge for downloads to runners (hosted or self-hosted).
This can therefore save the project money if we download from GitHub's
Docker reigstry over Google Cloud's.
This is a test to ensure this works as intended.
Change-Id: Iccdb1b7a912f1e0a0d82b7f888694958099315b3
The long/daily tests in "tests/gem5/gem5_library_tests" were running in
both the "testlib-long-tests" and the
"testlib-long-gem5_library_example_tests" job in the Daily tests
Workflow. The running in "testlib-long-tests" is removed in this patch.
Change-Id: I1c665529e3dcb594ffb7f6e2224077ae366772d6
When there is race between FwdGetX
and PUTX on owner. Owner in this case hands off
ownership to GetX requestor and PUTX still goes
through. But since owner has changed, state should
go back to M and PUTX is essentially trashed.
An Unblock to the Directory in this case will give an undefined
transition. I have added transitions which indicate that when
an Unblock is served to the Directory, it means that some kind
of ownership transfer has happened while a PUTX/PUTO was in
progress.
Change-Id: I37439b5a363417096030a0875a51c605bd34c127
https://github.com/gem5/gem5/actions/runs/6114221855 failure was due to
to running the actions inside our 22.04-all-dependencies container. This
container does not contain docker. We must therefore run this action
outside of the container. However, due to our policy of checking out the
code within this container, we must split this into two jobs and use the
artifact upload and download to get the resources we want.
Links to #293
After calling m5_dump_reset_stats(0,0) in a test program, some
statistics like
l1_controllers.L1Dcache.m_demand_hits,
l1_controllers.L1Dcache.m_demand_misses,
l1_controllers.L1Dcache.m_demand_accesses
were not getting reset in the newer stat dumps.
This one line patch fixes that. Changes were tested with calling two
m5_dump_reset_stats(0,0) in a row for a system with 1 core, tested on
both SE and FS.
Credits: @MeatBoy106
The x87 FPU tag word (FTW) was not explicitly initialized in
{X86_64,i386}Process::initState(), resulting in holding an initial value
of zero, resulting in an invalid x87 FPU state. This commit initializes
FTW to 0xFFFF, indicating the FPU is empty at program start during
syscall emulation.
The 16-bit FTW register was also incorrectly masked down to 8-bits in
X86ISA::ISA::setMiscRegNoEffect(), leading to an invalid X87 FPU state
that later caused crashes in the X86KvmCPU. This commit corrects the
bitwidth of the mask to 16.
GitHub issue: https://github.com/gem5/gem5/issues/303
A recent PR [1] moved the TraceCPU away from the BaseCPU hierarchy.
While the common etrace_replayer.py has been amended, I missed these
hybrid TLM + TraceCPU example scripts.
[1]: https://github.com/gem5/gem5/pull/302
After calling m5_dump_reset_stats(0,0) in a test program,
some statistics like
l1_controllers.L1Dcache.m_demand_hits,
l1_controllers.L1Dcache.m_demand_misses,
l1_controllers.L1Dcache.m_demand_accesses
were not getting reset in the newer stat dumps.
This one line patch fixes that. Changes were tested with
calling two m5_dump_reset_stats(0,0) in a row for a system
with 1 core, tested on both SE and FS.
Credits to Gabriel Busnot for finding the fix.
Change-Id: I19d75996fa53d31ef20f7b206024fd38dbeac643
A recent PR [1] moved the TraceCPU away from the BaseCPU hierarchy.
While the common etrace_replayer.py has been amended, I missed these
hybrid TLM + TraceCPU example scripts.
[1]: https://github.com/gem5/gem5/pull/302
Change-Id: I7e9bc9a612d2721d72f5881ddb2fb4d9ee011587
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is used by Pylance IntelliSense to infer gem5 typing.
See "util/gem5-stubgen.py" for generating this directory.
Change-Id: Ie39762c718e5392f6194ff7c8238bd0cd677f486
Python 3's `-P` flag, when set, means `sys.path` is not prepended with
potentially unsafe paths:
https://docs.python.org/3/using/cmdline.html#cmdoption-P
This patch allows gem5 to mimic this. This is necesssary when using
`mypy.stubgen` as it expects the Python Interpreter to have the `-P`
flag.
Change-Id: I456c8001d3ee1e806190dc37142566d50d54cc90
The x87 FPU tag word (FTW) was not explicitly initialized in
{X86_64,i386}Process::initState(), resulting in holding an initial
value of zero, resulting in an invalid x87 FPU state. This commit
initializes FTW to 0xFFFF, indicating the FPU is empty at program
start during syscall emulation.
The 16-bit FTW register was also incorrectly masked down to 8-bits
in X86ISA::ISA::setMiscRegNoEffect(), leading to an invalid X87 FPU
state that later caused crashes in the X86KvmCPU. This commit
corrects the bitwidth of the mask to 16.
GitHub issue: https://github.com/gem5/gem5/issues/303
Change-Id: I97892d707998a87c1ff8546e08c15fede7eed66f
The syscall emulation of newfstatat incorrectly treated the output stat
buffer to be of type `OS::tgt_stat`, not `OS::tgt_stat64`, causing the
invalid output stat buffer in the application to hold invalid data.
This patch fixes the bug by simply substituting the type `OS::tgt_stat`
with `OS::tgt_stat64` in `newstatatFunc()`.
GitHub issue: https://github.com/gem5/gem5/issues/281
The syscall emulation of tgkill contained a simple logic bug (a `||`
instead of a `&&`), causing the signal argument to always be considered
invalid. This patch fixes the bug by simply changing the `||` to a `&&`.
GitHub issue: https://github.com/gem5/gem5/issues/284
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1, Sec.
13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present in the
xsave area.
This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in the
xsave header to indicate the presence of x86 FPU and SSE state
components.
GitHub issue: https://github.com/gem5/gem5/issues/296
Changes in the PR:
1. Change the vset\*vl\* instructions to jump/branch family, and
implement the branchTarget.
2. Move the Vl and Vtype from decoder to PCState
3. get VL, VTYPE and VLENB value from PCState
4. Remove vtype checking in construction so that the minor and o3 cpu
and decode the instructions after the vset\*vl\*
Now that the TraceCPU is no longer a BaseCPU we disable CPU switching
functionality. AFAICS from the code, it seems like using m5.switchCpus
was never really working.
The takeOverFrom was described as being used when checkpointing
(which is not really the case). Moreover the icache/dcache
event loops were not checking if the CPU was switched out
so the trace was always been consumed regardless of the BaseCPU
state.
Note: IMHO the only case where you might want to switch between
an execution-driven CPU to the TraceCPU is when you want to
warm your caches before the ROI.
All other cases don't really make sense as with the TraceCPU
there is no architectural state being maintained/updated.
Change-Id: I0611359d2b833e1bc0762be72642df24a7c92b1e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
As we no longer inherit from the BaseCPU, we can't really use
CPU generation methods (like Simulation.setCPUClass) and
cache generation ones (like CacheConfig.config_cache).
This is good news as it allows us to simplify the etrace
script and to remove a dependency with the deprecated-to-be
common library.
Change-Id: Ic89ce2b9d713ee6f6e11bf20c5065426298b3da2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is fixing a recently reported issue [1] where it is
not possible to use the TraceCPU to replay elastic traces
It requires some architectural data structures (like ArchMMU,
ArchDecoder...) which are no longer defined in the BaseCPU class at
compilation time. Which Arch version should be used for a class
(TraceCPU) that is supposed to be ISA agnostic ? Does it really make
sense to define them for the TraceCPU? Those classes are not used anyway
during trace replay and their sole purpose would just be to comply to
the BaseCPU interface.
As there is no elegant way to make things work, this patch stops
treating the TraceCPU as a BaseCPU.
While it philosophically makes sense to treat the TraceCPU as a common
CPU (it sort of replays pre-executed instructions), a case can be made
for considering it more like a traffic generator.
[1]: https://github.com/gem5/gem5/issues/301
Change-Id: I7438169e8cc7fb6272731efb336ed2cf271c0844
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The changes includes:
1. Add VL, Vtype and VlenbBits operands
2. Change R/W methods of VL, Vtype and VlenbBits from PCState
Change-Id: I0531ddc14344f2cca94d0e750a3b4291e0227d54
We should not try to check vtype when decoding the instruction.
It should be checked in vset{i}vl{i} since the register can be
modified via vset{i}vl{i}
Change-Id: I403e5c4579bc5b8e6af10f93eac20c14662e4d2d
This adds the setting up of environment variables to the gcn-gpu
Dockerfile from the halofinder Dockerfile in gem5-resources so that we
don't need to use a separate Docker image in the gpu tests.
This reverts commit 0249d47acc,
https://github.com/gem5/gem5/pull/271.
This solution doesn't work. GitHub runners pull the images they need at
the start of job (i.e., all the images they may need for each step).
They then create the containers later, at the step they are needed. This
solution therefore breaks in the case a cleanup happens during the
running of a job. I.e., a `docker system prune` happens after setup,
therefore deleting all the images, then the job tries to use one of the
images during a step.
This crontab solution may work if we can only do it when the runner is
in an idle state. Whether this is possible is unknown.
This reverts commit 0249d47acc,
https://github.com/gem5/gem5/pull/271.
This solution doesn't work. GitHub runners pull the images they need at
the start of job (i.e., all the images they may need for each step).
They then create the containers later, at the step they are needed.
This solution therefore breaks in the case a cleanup happens during the
running of a job. I.e., a `docker system prune` happens after setup,
therefore deleting all the images, then the job tries to use one of the
images during a step.
This crontab solution may work if we can only do it when the runner is
in an idle state. Whether this is possible is unknown.
Change-Id: I7cb5b2d98d596e9380ae1525c7d66ad97af1b59b
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1,
Sec. 13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present
in the xsave area.
This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in
the xsave header to indicate the presence of x86 FPU and SSE state
components.
GitHub issue: https://github.com/gem5/gem5/issues/296
Change-Id: I5c5c7925fa7f78a7b5e2adc209187deff53ac039
The syscall emulation of tgkill contained a simple logic bug
(a `||` instead of a `&&`), causing the signal argument to always
be considered invalid. This patch fixes the bug by simply changing
the `||` to a `&&`.
GitHub issue: https://github.com/gem5/gem5/issues/284
Change-Id: I3b02c618c369ef56d32a0b04e0b13eacc9fb4977
Modify the FDArray::unserialize function to perform a checkPathRedirect
if a Process pointer is passed in.
Currently when restoring a checkpoint, it doesn't perform
checkPathRedirect for files that were opened during checkpointing. This
patch adds a checkPathRedirect in the FDArray::unserialize to redirect
app path for restoring checkpoints.
The syscall emulation of newfstatat incorrectly treated the output
stat buffer to be of type `OS::tgt_stat`, not `OS::tgt_stat64`, causing
the invalid output stat buffer in the application to hold invalid
data.
This patch fixes the bug by simply substituting the type `OS::tgt_stat`
with `OS::tgt_stat64` in `newstatatFunc()`.
GitHub issue: https://github.com/gem5/gem5/issues/281
Change-Id: Ice97c1fc4cccbfb6824e313ebecde00f134ebf9c
To use mold linker with gcc of version older than 12.1.0, the user has
to pass the -B option to specify where the linker is. [1]
Currently in gem5, scons only looks for the mold binary at conventional
places, such as /usr/libexec/mold and /usr/local/libexec/mold. There's
no option to manually specify the path to the linker.
gcc-12 and mold are not widely available on older systems. Having an
option to manually input the linker path allows user to build and use
mold without sudo permission.
[1] https://github.com/rui314/mold#how-to-use
This change, https://github.com/gem5/gem5/pull/205, mistakenly
allocates write buffer for clflush instruction when there's a
cache miss. However, clflush in gem5 is not a write instruction.
Thus, the cache should allocate miss buffer in this case.
Change-Id: I9c1c9b841159c4420567e9c929e71e4aa27d5c28
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This adds the setting up of environment variables to the gcn-gpu
Dockerfile from the halofinder Dockerfile in gem5-resources
so that we don't need to use a seperate Docker image in the
gpu tests.
Change-Id: Ifcc7a4c6bbcd5289ce9561923366e9ed193f170c
This commit fixes a crash in the syscall emulation of the chdir(2)
syscall, implemented by chdirFunc() in src/sim/syscall_emul.cc, when
passed a nonexistent directory. The buggy code did not check the return
value of realpath().
This patch adds code to check the return value of realpath(), and if it
is NULL (i.e., there was an error with the requested directory to change
to), propagates the error in `errno` to the application.
GitHub issue: https://github.com/gem5/gem5/issues/276
This will hold the CHI Data Buffer Identifier (DBID) field.
The DBID allows a Completer of a transaction to provide its own
identifier for a transaction ID.
This new ID will be used as a TxnId field by a following
WriteData/CompData/CompAck response.
For now we only set it to the original txnId (identity mapping)
Change-Id: If30c5e1cafbe5a30073c7cd01d60bf41eb586cee
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The TxnId field of a CHI request has so far been unused (other than for
DVM transactions). With this patch we always initialize the field when
we extract a ruby request from the sequencer port.
According to specs (IHI0050F):
A 12-bit field is defined for the TxnID with the number of outstanding
transactions being limited to 1024. A Requester is permitted to reuse a
TxnID value after it has received either:
* All responses associated with a previous transaction that have used
the same value.
* A RetryAck response for a previous transaction that used the same value
Change-Id: Ie48f0fee99966339799ac50932d36b2a927b1c7d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Based on the CHIRequestType, it automatically tells if the
request has been originated from the sequencer
(CPU load/fetch/store)
Change-Id: I50fd116c8b1a995b1c37e948cd96db60c027fe66
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
At the moment an address value can only be used in the slicc code
to do TBE lookups but there is no way to add/subtract/divide/multiply
two addresses nor an address and an integer value.
This hinders the development of protocol specific code and
forces developers to place such code in shared
C++ structures
Change-Id: Ia184e793b6cd38f951f475a7cdf284f529972ccb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
At the moment it is possible to static_cast by pointer/reference only:
static_cast(type, "pointer", val) -> static_cast<type*>(val);
static_cast(type, "reference", val) -> static_cast<type&>(val);
With this patch it will also be possible to do something like
static_cast(type, "value", val) -> static_cast<type>(val);
Which is important when wishing to convert integer types into
custom onces and viceversa.
This patch is also deferring static_cast type check to C++
At the moment it is difficult to use the static_cast utility in slicc as
it tries to handle type checking in the language itself. This would
force us to explicitly define compatible types (like an Addr and an int
as an example). Rather than pushing the burden on us, we should always
allow a developer to use a static_cast in slicc and let the C++ compiler
complain if the generated code is not compatible
Change-Id: I0586b9224b1e41751a07d15e2d48a435061c2582
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
https://github.com/gem5/gem5/actions/runs/6114221855 failure was due
to to running the actions inside our 22.04-all-dependencies container.
This container does not contain docker. We must therefore run this action
outside of the container. However, due to our policy of checking out the
code within this container, we must split this into two jobs and use the
artifact upload and download to get the resources we want.
Change-Id: I6a5f9c3a4c287a56a3d5abe3b84dd560fa2e9ff1
Introduced in https://github.com/gem5/gem5/pull/236 the
"docker-build.yaml" file will allow us to build and push docker images
to the GitHub Container Registry. This allows for both automation of
docker image building and allows us to utilize Github's zero-cost
pulling policy for downloads to GitHub Actions runners.
In this PR https://github.com/gem5/gem5/pull/236 has been altered to use
Docker `buildx` which allows for multi-platform Docker Image builds. A
multi-platform Docker image pull automatically pull the correct image
for your platform from a single URL. In this prototype the images are
build to both `linux/arm64` and `linux/amd64` have been set.
Docker `buildx` has it's own file format for specifying image builds
called `bake`. "util/dockerfiles/docker-bake.hcl" has been added with
the goal of replacing "util/dockerfiles/docker-compose.yaml".
In this proof-of-concept doesn't build all our docker images, just
enough to ensure it works inside our actions as intended.
When reset a port, we don't want to trigger a onChange(). Offer an
option to bypass it and update state only.
Change-Id: Ia53b7a76d2a320ea67101096cdbfe2eafaf440d2
sim/fd_array.hh:
Add "class Process;" to forward declare Process for unserialize
function to pass in a Process object pointer.
Fix the styling issue with include files.
sim/fd_array.cc"
Add comments.
Change-Id: Ifb21eb1c7bad119028b8fd8e610a125100fde696
When reset a port, we don't want to trigger a onChange().
Offer an option to bypass it and update state only.
Change-Id: Ia53b7a76d2a320ea67101096cdbfe2eafaf440d2
This commit fixes a crash in the syscall emulation of the chdir(2)
syscall, implemented by chdirFunc() in src/sim/syscall_emul.cc,
when passed a nonexistent directory. The buggy code did not check
the return value of realpath().
This patch adds code to check the return value of realpath(), and
if it is NULL (i.e., there was an error with the requested directory
to change to), propagates the error in `errno` to the application.
GitHub issue: https://github.com/gem5/gem5/issues/276
Change-Id: I8a576f60fe3687f320d0cfc28e9d3a6b477d7054
Though in "ext" this directory is regularly modified. `pre-commit`
should run on these files.
This PR includes running `pre-commit run --files ext/testlib` to
reformat the files in "ext/testlib" using Python Black.
This patch fixes the buggy special path comparisons in
src/kern/linux/linux.cc Linux::openSpecialFile(), which only checked for
equality of path prefixes, but not equality of the paths themselves.
This patch replaces those buggy comparisons with regular
std::string::operator== string equality comparisons.
GitHub issue: https://github.com/gem5/gem5/issues/269
This ensures that if the CI tests are running for a PR, and a new
workflow is triggered (typically by pushing/rebasing the PR) then the
older workflow is cancelled.
This ensures that if the CI tests are running for a PR, and a new
workflow is triggered (typically by pushing/rebasing the PR) then the
older workflow is cancelled.
Change-Id: Ifa172bdbdac09c5a91abb41a0162c597445e4e2e
The bug report template used escape characters. This is not necessary as
the bug report is not rendered when creating a bug report. It is
displayed to the user in plain text for them to edit.
In addition languages have been added to the code-blocks and newlines
have been added and removed where appropriate to cleanup the document.
Introduced in https://github.com/gem5/gem5/pull/236 the
"docker-build.yaml" file will allow us to build and push docker images
to the GitHub Container Registry. This allows for both automation of
docker image building and allows us to utilize Github's zero-cost
pulling policy for downloads to GitHub Actions runners.
In this PR https://github.com/gem5/gem5/pull/236 has been altered to
use Docker `buildx` which allows for multi-platform Docker Image builds.
A multi-platform Docker image pull automatically pull the correct image
for your platform from a single URL. In this prototype the images are
build to both `linux/arm64` and `linux/amd64` have been set.
Docker `buildx` has it's own file format for specifying image builds
called `bake`. "util/dockerfiles/docker-bake.hcl" has been added with
the goal of replacing "util/dockerfiles/docker-compose.yaml".
In this proof-of-concept doesn't build all our docker images, just
enough to ensure it works inside our actions as intended.
Change-Id: Id0debed216c91ec514aa4fce3bc2ff4fc2ea669b
This patch fixes the buggy special path comparisons in
src/kern/linux/linux.cc Linux::openSpecialFile(), which only checked
for equality of path prefixes, but not equality of the paths
themselves. This patch replaces those buggy comparisons with
regular std::string::operator== string equality comparisons.
GitHub issue: https://github.com/gem5/gem5/issues/269
Change-Id: I216ff8019b9a6a3e87e364c2e197d9b991959ec1
Though in ext, we regularly modify these files to add features and
extend our testlib testing infrastructure. Ergo, the pre-commit checks
should be run.
Change-Id: I921a263f25f850b03e5535a8a1f509921c124763
Currently the MOESI_AMD_Base-directory transition for system level
atomics sends the response message before the atomic is performed. This
was likely done because atomics are supposed to return the value of the
data *before* the atomic is performed and by simply ordering the actions
this way that was taken care of.
With the new atomic log feature, the atomic values are pulled from the
log by the coalescer on the return path. Therefore, these actions can be
reordered. In fact, it is now necessary that the atomics be performed
before sending the response so that the log is populated and copied by
the response action. This should fix#253 .
There were some weird newline characters in this file, or lack of lines.
This patch adds/removes them.
Change-Id: I6cc918788c07bbc4be5c68401ad3987be00fffc4
The bug_report.md is rendered as plain text, not markdown, when creating
a bug report. As such the escape characters are removed in this commit.
Change-Id: I524c66ae61d00b7ed59153ba9f4b2297ff50ee18
Currently the MOESI_AMD_Base-directory transition for system level
atomics sends the response message before the atomic is performed. This
was likely done because atomics are supposed to return the value of the
data *before* the atomic is performed and by simply ordering the actions
this way that was taken care of.
With the new atomic log feature, the atomic values are pulled from the
log by the coalescer on the return path. Therefore, these actions can be
reordered. However, it is now necessary that the atomics be performed
before sending the response so that the log is populated and copied by
the response action. This should fix#253 .
Change-Id: Ie7e178f93990975367de2cc3e89e5ef9c9069241
To use mold linker with gcc of version older than 12.1.0, the user
has to pass the -B option to specify where the linker is. [1]
Currently in gem5, scons only looks for the mold binary at
conventional places, such as /usr/libexec/mold and
/usr/local/libexec/mold. There's no option to manually specify
the path to the linker.
gcc-12 and mold are not widely available on older systems. Having
an option to manually input the mold linker path allows users to use
built mold instances anywhere on the system, not just the default
locations in /usr where they may not have permission to install mold
(i.e., no sudo permissions).
[1] https://github.com/rui314/mold#how-to-use
Change-Id: Ifb2366a0c2342bf4e7207df8db6196e14184a9d4
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
gdb was originally part of the ROCm 1.6 Dockerfile a few years ago. It
got removed when updating to ROCm 4.0. This adds it back as being able
to debug things is quite useful.
This isn't necessary. Without 'run-name' the action's default name is
'run-name'. Displaying the actor who launched the action is pointless
for scheduled tests.
Starting with gfx900 (Vega) the LDS and scratch apertures can be queried
using a new s_getreg_b32 instruction. If the instruction is called with
the SH_MEM_BASES argument it returns the upper 16 bits of a 64 bit
address for the LDS and scratch apertures. The current addresses cannot
be encoded in this register, so that addresses are changed to have the
lower 48 bits be all zeros in addition to writing the bases register.
The RM field of ModRM was printed as Reg field for several instructions.
For reference, this change fixes typos introduced by [1].
[1] https://gem5-review.googlesource.com/c/public/gem5/+/40339
Change-Id: I41eb58e6a70845c4ddd6774ccba81b8069888be5
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
gdb was originally part of the ROCm 1.6 Dockerfile a few years ago. It
got removed when updating to ROCm 4.0. This adds it back as being able
to debug things is quite useful.
Change-Id: I3f8148cde79e6cc5233fa3c8c830b64817f01d3a
Starting with gfx900 (Vega) the LDS and scratch apertures can be queried
using a new s_getreg_b32 instruction. If the instruction is called with
the SH_MEM_BASES argument it returns the upper 16 bits of a 64 bit
address for the LDS and scratch apertures. The current addresses cannot
be encoded in this register, so that addresses are changed to have the
lower 48 bits be all zeros in addition to writing the bases register.
Change-Id: If20f262b2685d248afe31aa3ebb274e4f0fc0772
This isn't necessary. Without 'run-name' the action's default name is
'run-name'. Displaying the actor who launched the action is pointless
for scheduled tests.
Change-Id: I15d52959389881381ef7685efb57152c5162c89d
Augmenting the DataBlock class with a change log structure to record the
effects of atomic operations on a data block and service these changes
if the atomic operations require return values.
Although the operations are atomic, the coalescer need not send unique
memory requests for each operation. Atomic operations within a wavefront
to the same address are now coalesced into a single memory request. The
response of this request carries all the necessary information to
provide the requesting lanes unique values as a result of their
individual atomic operations. This helps reduce contention for request
and response queues in simulation.
Previously, only the final value of the datablock after all atomic ops
to the same address was visible to the requesting waves. This change
corrects this behavior by allowing each wave to see the effect of this
individual atomic op is a return value is necessary.
This is built to test the following assumptions:
1. We can trigger a GitHub action event on the changing of a
file/directory.
2. We can use GitHub actions to build a docker image.
3. We can use GitHub actions to push a docker image to a container
registry.
4. We can use GitHub's container registry.
Right now this will only build and push ubuntu-20.04_all-depenencies, as
a test.
This patch allows a local JSON file to specify a local path in the JSON
object of a Resource, through the "url" field.
Local paths can be entered with the prefix "file:" in the "url" field.
If the local path exists, then the Resource from there is copied into
the resource directory defined in the
function earlier.
This behavior is the same as using specific Resource classes (ex.
BinaryResource) and passing a local_path into the function.
But, the above class does not allow simultaneous creation of local
Resources and Workloads of those local Resources.
With this patch, someone can use a local JSON, specify the location of
local Resources and create a Workload from those Resources and test both
together.
This patch allows a local JSON file to specify a local path
in the JSON object of a Resource, through the "url" field.
Local paths can be entered with the prefix "file:" in "url".
All File URI scheme formats are supported.
This behavior is the same as using specific Resource classes
(ex. BinaryResource) and passing a local_path into the function.
But, the above infrastructure does not allow simultaneous
creation of Resources and Workloads of those Resources.
With this patch, someone can use a local JSON, specify the location
of local Resources and create a Workload from those Resources and
test both together.
Also, this patch adds pyunit tests to check the functionality
of the function used to convert the "url" field into a path.
Change-Id: I1fa3ce33a9870528efd7751d7ca24c27baf36ad4
There is conversion error in ./util/streamline/m5stats2streamline.py
script to convert gem5 stats.txt,sys, system.tasks.txt to the apc folder
required by DS-5 streamline. The fix to the bug can convert to apc
folder without error. The zipped apc folder can then be imported in
older DS-5 v5.24 for visualization (didn't work with DS-5 v5.29).
Changes:
1) writeBinary function binary_list can have either string or ints and
it needs to be properly converted to bytes
2) packed32(x) function can have x as int or float. Incase of float it
needs to be converted to int
The bug was reported and solved primarily in the issue
https://github.com/gem5/gem5/issues/145
Change-Id: I6a52aa59e1582dd6bb06b2d1c49ddaf8fe61c997
This is built to test the following assumptions:
1. We can trigger a GitHub action event on the changing of a
file/directory.
2. We can use GitHub actions to build a docker image.
3. We can use GitHub actions to push a docker image to a container
registry.
4. We can use GitHub's container registry.
Right now this will only build and push ubuntu-20.04_all-depenencies, as
a test.
Change-Id: Ie1a55c97c6eef26281456c908e1200b27da4d961
These allow visitors to the repository to quickly see the status of our
tests run on the develop branch.
Change-Id: I3658c0e0d9dea66feebd69588c8a29d369a0b43d
DCT must be disabled when handling a ReadUnique where the copy need to
be upgraded.
Previously we were just asserting as it was assumed DCT is only enabled
for HNFs (which can "auto-upgrade"). However DCT may also be enabled for
intermediated levels of distributed shared caches above the HNFs.
Currently we generate these stats for all defined Events in the
protocol, which may generate too many stats that are never used. Though
these don't appear in the stats.txt file, they unnecessarily increases
simulation startup time and memory footprint.
This patch limits those stats to events with the "in_trans" and/or
"out_trans" properties. SLICC compiler then checks which combinations of
event+state are possible when generating the stats.
Also the possible level of detail for inTransLatHist was reduced.
Only the number of transactions for each event+initial+final state
combinations is now accounted. Latency histograms are only defined per
event type (similarly to outTransLatHist). This significantly reduces
the final file size for generated stats.
1) writeBinary function binary_list can have either string or ints and it needs to be properly converted to bytes
2) packed32(x) function can have x as int or float. Incase of float it needs to be converted to int
3) encode lines to string using .decode() or else TypeError will be invoked during run
Change-Id: I678169f191901f02a80187418a17adbc1240c7d3
1) writeBinary function binary_list can have either string or ints and it needs to be properly converted to bytes
2) packed32(x) function can have x as int or float. Incase of float it needs to be converted to int
Change-Id: I6a52aa59e1582dd6bb06b2d1c49ddaf8fe61c997
Flat scratch instructions (aka private) are the 3rd and final segment of
flat instructions in gfx9 (Vega) and beyond. These are used for things
like spills/fills and thread local storage. This commit enables two
forms of flat scratch instructions: (1) flat_load/flat_store
instructions where the memory address resolves to private memory and (2)
the new scratch_load/scratch_store instructions in Vega. The first are
similar to older generation ISAs where the aperture is unknown until
address translation. The second are instructions guaranteed to go to
private memory.
Since these are very similar to flat global instructions there are
minimal changes needed:
- Ensure a flat instruction is either regular flat, global, XOR scratch
- Rename the global op_encoding methods to GlobalScratch to indicate
they are for both and are intentionally used.
- Flat instructions in segment 1 output scratch_ in the disassembly
- Flat instruction executed as private use similar mem helpers as global
- Flat scratch cannot be an atomic
This was tested using a modified version of the 'square' application:
template <typename T>
__global__ void
scratch_square(T *C_d, T *A_d, size_t N)
{
size_t offset = (blockIdx.x * blockDim.x + threadIdx.x);
size_t stride = blockDim.x * gridDim.x ;
volatile int foo; // Volatile ensures scratch / unoptimized code
for (size_t i=offset; i<N; i+=stride) {
foo = A_d[i];
C_d[i] = foo * foo;
}
}
Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d
According to the ABI documentation from LLVM, the *low* register of flat
scratch (maxSGPR - 4) is the offset and the high register (maxSGPR - 3)
is size. These are currently backwards, resulting in some gnarly
addresses being generated leading to page fault and/or incorrect data.
This commit fixes this by setting the order correctly.
Change-Id: I0b1d077c49c0ee2a4e59b0f6d85cdb8f17f9be61
Flat instructions may access memory locations in LDS (scratchpad) and
global (VRAM/framebuffer) and therefore increment both counters when
dispatched. Once the aperture is known, we decrement the counters of the
aperture that was *not* used. This is done incorrectly for scratch /
private flat instruction. Private memory is global and therefore local
memory counters should be decremented.
This commit fixes the counters by changing the global decrements to
local decrements.
Change-Id: I25890446908df72e5469e9dbaba6c984955196cf
The functional HSA signal read was a hack left in the gpu-compute code.
In full system, this functional read is causing problems occasionally
with the translation not yet being in the page table. The error message
output by gem5 was a fatal message on the readBlob method in port proxy.
Changing this to a timing DMA fixes this problem.
This commit adds the various timing DMA functions to send and receive
response and clean up. A helper method "sendCompletionSignal" is added
to the GPUCommandProcessor because the indentation level was getting too
deep. This change applies only to FS mode. Code for SE mode is
equivalent to what it was before this commit.
Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48
The functional HSA signal read was a hack left in the gpu-compute code.
In full system, this functional read is causing problems occasionally
with the translation not yet being in the page table. The error message
output by gem5 was a fatal message on the readBlob method in port proxy.
Changing this to a timing DMA fixes this problem.
This commit adds the various timing DMA functions to send and receive
response and clean up. A helper method "sendCompletionSignal" is added
to the GPUCommandProcessor because the indentation level was getting too
deep. This change applies only to FS mode. Code for SE mode is
equivalent to what it was before this commit.
Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48
This updates all the jobs for our CI tests to make sure they
don't run tests on draft pull request, and only trigger when
ready for review
Change-Id: I3fe7ae373c39fc6ef594c0c71c6f10e7319553d8
This is an experiment. The runners were sometimes running out of memory
building gem5. The builders have more memory so should be able to
handling this. The runners have 4-cores so compilation should be faster
(note the inclusion of the `-j$(nproc)`.
configs,dev-amdgpu: Add PCI express capability info
The ROCm stack requires PCI express atomics. Currently the first PCI
CapabilityPtr does not point to anything, which signals to the OS
(Linux) that this is an early generation PCI device. As PCI express
atomics were introduced later, the CapabilityPtr needs to point to at
least a PCI express capability structure. This capability is defined as
0x10 in Linux. We additionally set the PCI atomic based bits and
implement device specific PCI configuration space reads and writes to
the amdgpu device.
The second commit, output of simulation when loading the amdgpu
driver no longer outputs "PCIE atomics not supported". Further, an
application which uses PCIe atomics (PyTorch with a reduce_sum kernel)
now makes further progress.
First commit is a minor typo fix changing PCI capability struct to
union.
In the RISC-V system, we need to VecClassReg to run RISC-V vector
instruction, and VecElemReg is not applicable because the element length
of vector can be resizable via vset\*vl\* instruction.
The change will seperate the reg_index for VecReg and VecElemReg to
ensure that have the space for VecReg when VecElemReg is not applicable.
When an Evict request is received from upstream for a shared line and
the line is no longer cached locally (or on any other upstream cache),
we need to also send an Evict downstream. In this case we need to wait
until our outgoing Evict completes before completing the Evict from
upstream in order be able to resolve race conditions with incoming
snoops. E.g.: while our outgoing Evict is pending we may receive a snoop
requesting data, but we won't be able to complete this snoop if we have
already completed all upstream Evicts and we no longer have the line.
There are a few LDS instructions that perform local ALU operations and
writeback which are marked as loads. These are marked as loads because
they fit in the pipeline logic better, according to a several year old
comment. In the VEGA ISA these instructions (swizzle, permute, bpermute)
are not decrementing the LDS load counter. As a result, the counter will
gradually increase over time. Since wavefront slots are persistent, this
can cause applications with a few thousand kernels to eventually hang
thinking there are not enough resources.
This changeset fixes this by decrementing the LDS load counter for these
instructions. This fix was already integrated in the GCN3 ISA in the
exact same way. This changeset moves it near a similar comment about
scheduling register file writes.
Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c
The ROCm stack requires PCI express atomics. Currently the first PCI
CapabilityPtr does not point to anything, which signals to the OS
(Linux) that this is an early generation PCI device. As PCI express
atomics were introduced later, the CapabilityPtr needs to point to at
least a PCI express capability structure. This capability is defined as
0x10 in Linux. We additionally set the PCI atomic based bits and
implement device specific PCI configuration space reads and writes to
the amdgpu device.
With this commit, the output of simulation when loading the amdgpu
driver no longer outputs "PCIE atomics not supported". Further, an
application which uses PCIe atomics (PyTorch with a reduce_sum kernel)
now makes further progress.
Change-Id: I5e3866979659a2657f558941106ef65c2f4d9988
The goal is to fix this issue which appears to be affects some Apple
users: https://github.com/gem5/gem5/issues/94.
By specializing the `EXC_*` to gem5 we avoid the name conflicts plagiing
some users.
In the RISC-V system, we need to VecClassReg to run RISC-V vector
instruction, and VecElemReg is not applicable because the element
length of vector can be resizable via vset*vl* instruction.
The change will seperate the reg_index for VecReg and VecElemReg to
ensure that have the space for VecReg when VecElemReg is not
applicable.
Change-Id: I99a82dec273baeee31df89a0ee0f5e87f3ff187c
The capabilities for PCI express is a struct, instead of a union, like
the other capability unions. A union is used here to provide access to
the ordinal data values when reading/writing an offset while
simultaneously providing human readable field values that can be set
when writing the code.
This commit changes it to union which is likely should be. Nothing
appears to be using this union yet so it is likely an oversight.
Change-Id: I85fe7cc62914525c70fd7a5946d725ed308f8775
There are a few LDS instructions that perform local ALU operations and
writeback which are marked as loads. These are marked as loads because
they fit in the pipeline logic better, according to a several year old
comment. In the VEGA ISA these instructions (swizzle, permute, bpermute)
are not decrementing the LDS load counter. As a result, the counter will
gradually increase over time. Since wavefront slots are persistent, this
can cause applications with a few thousand kernels to eventually hang
thinking there are not enough resources.
This changeset fixes this by decrementing the LDS load counter for these
instructions. This fix was already integrated in the GCN3 ISA in the
exact same way. This changeset moves it near a similar comment about
scheduling register file writes.
Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c
This is an experiment. The runners were sometimes running out of memory
building gem5. The builders have more memory to handle this. The runners
have 4-cores so compilation should be faster (note the inclusion of the
`-j$(nproc)`.
Change-Id: I964c5a778938b449502d92dec3431f8b788397e4
Marks which events signal the beginning of incoming and outgoing
transactions for generating inTransLatHist and outTransLatHist stats.
Change-Id: I90594a27fa01ef9cfface309971354b281308d22
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Generating these stats for all defined Events may generate too many
stats that are never used, which unnecessarily increases simulation
startup time and memory consumption.
This patch limits those stats to events with the "in_trans" and/or
"out_trans" properties. SLICC compiler then checks which combinations
of event+state are possible when generating the stats.
Also the possible level of detail for inTransLatHist was reduced.
Only the number of transactions for each event+initial+final state
combinations is now accounted. Latency histograms are only defined
per event type (similarly to outTransLatHist). This significantly
reduces the final file size for generated stats.
Change-Id: I29aaeb771436cc3f0ce7547a223d58e71d9cedcc
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Do not respond with SnpRespData_I when the line is still present
upstream.
Change-Id: I2592e5c6637cfc0e83042169a245837648276e61
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
DCT must be disabled when handling a ReadUnique where the copy
need to be upgraded.
Previously we were just asserting as it was assumed DCT is only enabled
for HNFs (which can "auto-upgrade"). However DCT may also be enabled
for intermediated levels of distributed shared caches above the HNFs.
Change-Id: I9e29142a8d2f59ea61c1d90cda6b00c19435d6b7
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
- Added deprecated warnings to Workload and Abstract workload.
- Added comments to the classes changed.
Change-Id: I671daacf5ef455ea65103bd96aa442486142a486
When an Evict request is received from upstream for a shared line
and the line is no longer cached locally (or on any other upstream
cache), we need to also send an Evict downstream. In this case we need
to wait until our outgoing Evict completes before completing the Evict
from upstream in order be able to resolve race conditions with incoming
snoops. E.g.: while our outgoing Evict is pending we may receive a
snoop requesting data, but we won't be able to complete this snoop if
we have already completed all upstream Evicts and we no longer have the
line.
Change-Id: I23ac4f0a9c4ddd81e2425376c8d1e1c7fb66d107
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Augmenting the DataBlock class with a change log structure to
record the effects of atomic operations on a data block and
service these changes if the atomic operations require return
values.
Although the operations are atomic, the coalescer need not
send unique memory requests for each operation. Atomic
operations within a wavefront to the same address are now
coalesced into a single memory request. The response of this
request carries all the necessary information to provide the
requesting lanes unique values as a result of their individual
atomic operations. This helps reduce contention for request
and response queues in simulation.
Previously, only the final value of the datablock after all
atomic ops to the same address was visible to the requesting
waves. This change corrects this behavior by allowing each wave
to see the effect of this individual atomic op is a return value
is necessary.
Change-Id: I639bea943afd317e45f8fa3bff7689f6b8df9395
1. The current SignalSinkPort and SignalSourcePort have no
ways to assign the init value of the state. Add a new constructor
for them with the param init_state
2. After the source and sink are bound, the state at both side should
be the same. Set the the state of sink to the state of source in the
bind() function.
Change-Id: Idde0a12aa0ddd0c9c599ef47059674fb12aa5d68
Compilation bug found on:
https://github.com/gem5/gem5/actions/runs/5899831222/job/16002984553
In gcc Version 8 and below the following error is received:
```
src/base/bitfield.hh: In function ‘constexpr int gem5::findLsbSet(uint64_t)’:
src/base/bitfield.hh:365:34: error: call to non-‘constexpr’ function ‘int gem5::{anonymous}::findLsbSetFallback(uint64_t)’
return findLsbSetFallback(val);
~~~~~~~~~~~~~~~~~~^~~~~
scons: *** [build/ALL/kern/linux/events.o] Error 1
```
`findLsbSet` cannot be `constexr` as it calls non-constexpr function
`findLsbSetFallback`. `findLsbSetFallback`. The problematic function is
the `count` on the std::bitset.
This patch changes this to a constexpr.
Upload the config script to make it only for riscv asmtest and replace
Resource with obtain_resourse
Change-Id: I0bab96ea352b7ce1c6838203bfa13eee795f41f9
The goal is to fix this issue which appears to be affects some Apple
users: https://github.com/gem5/gem5/issues/94.
By specializing the `EXC_*` to gem5 we avoid the name conflicts plagiing
some users.
Change-Id: I031f7110b4b4ae82677b6586903cd57b22ca2137
Compilation bug found on:
https://github.com/gem5/gem5/actions/runs/5899831222/job/16002984553
In gcc Version 8 and below the following error is received:
```
src/base/bitfield.hh: In function ‘constexpr int gem5::findLsbSet(uint64_t)’:
src/base/bitfield.hh:365:34: error: call to non-‘constexpr’ function ‘int gem5::{anonymous}::findLsbSetFallback(uint64_t)’
return findLsbSetFallback(val);
~~~~~~~~~~~~~~~~~~^~~~~
scons: *** [build/ALL/kern/linux/events.o] Error 1
```
`findLsbSet` cannot be `constexr` as it calls non-constexpr function
`findLsbSetFallback`. `findLsbSetFallback`. The problematic function is
the `count` on the std::bitset.
This patch changes this to a constexpr.
Change-Id: I48bd15d03e4615148be6c4d926a3c9c2f777dc3c
When a linux kernel changes a page property, it flushes the related
cache lines. The kernel might change the page property before flushing
the cache lines. This results in the clflush might occur in an
uncacheable region.
Currently, an uncacheable request must be a read or a write. However,
clflush request is neither of them.
This change aims to allow clflush requests to work on uncacheable
regions. Since there is no straightforward way to check if a packet is
from a clflush instruction, this change permits all Clean Invalidate
Requests, which is the type of request produced by clflush, to work on
uncacheable regions.
This fixes:
1. Most importantly: The submodule recursive update was incorrect. This
adds the recursive obtaining of submodules as a seperate explicity step.
2. Changes the `git clone` to use https.
Update the cxx_config_cc.oy port description generation to use the
port.is_source attribute.
Github Issue: https://github.com/gem5/gem5/issues/181
Change-Id: I3fa12c2fbb06083379118e57aedb8be414c0d929
When a linux kernel changes a page property, it flushes the related cache
lines. The kernel might change the page property before flushing the
cache lines. This results in the clflush might occur in an uncacheable region.
Currently, an uncacheable request must be a read or a write. However,
clflush request is neither of them.
This change aims to allow clflush requests to work on uncacheable regions.
Since there is no straightforward way to check if a packet is from a clflush
instruction, this change permits all Clean Invalidate Requests, which is
the type of request produced by clflush, to work on uncacheable regions.
Change-Id: Ib3ec01d9281d3dfe565a0ced773ed912edb32b8f
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This fixes:
1. Most importantly: The submodule recursive update was incorrect. This
adds the recursive obtaining of submodules as a seperate explicity step.
2. Changes the `git clone` to use https.
Change-Id: Iad69e44b927a5aa982b49dffa6929c52fcc7ee72
Added save and restore checkpoint tests for arm-hello, x86-hello,
x86-fs, power-hello
Added mips and sparc test but mips does not support checkpoint and there
is a bug in sparc.
Added test file to run the tests.
- Added WrokloadResource in resource.py.
- depricated Workload and CustomWorkload.
- changed iscvmatched-fs.py with obtain resource for workload to test.
Change-Id: I2267c44249b96ca37da3890bf630e0d15c7335ed
Note: change example files back to original
This changes continue-on-error to be fail-fast instead, as
continue-on-error will mark failed matrix runs as
successful, whereas fail-fast makes sure everything in the matrix runs,
but gets marked as failed if part of it fails.
In this case the function is turned into a generator with the
"yield" of the generator the return the function's execution.
Change-Id: I4b06d64c5479638712a11e3c1a2f7bd30f60d188
This changes continue-on-error to be fail-fast instead, as
continue-on-error will mark failed matrix runs as
successful, whereas fail-fast makes sure everything in the matrix
runs, but gets marked as failed if part of it fails.
Change-Id: Ie20652c229b6cce9f1c0a45958b088391e7aae97
Any instructions require vector register should check if vector is
enabled. Any instructions need vtype CSR to execute them should check
vill bit beforehead.
On RISC-V when trap occurs the contents of PC register contains the
address of instruction that caused that trap (as opposed to the address
of instruction following it in instruction stream). Therefore this commit
does not advance the PC before reporting trap in SE mode.
Change-Id: I83f3766cff276312cefcf1b4ac6e78a6569846b9
Due to inverted logic in POWER fault handlers, unimplemented opcode and
trap faults did not report trap to GDB (if connected). This commit fixes
the problem.
While at it, I opted to use `if (! ...) { panic(...) }` rather than
`panic_if(...)`. I find it easier to understand in this case.
Change-Id: I6cd5dfd5f6546b8541d685e877afef21540d6824
The mem instructions usually executed from initiateAcc. We also need
to check vector required in those instructions
Change-Id: I97b4fec7fada432abb55ca58050615e12e00d1ca
Detected via this failing workload:
https://github.com/gem5/gem5/actions/runs/5861958237
Ir caused the following compilation error to be thrown:
```
src/arch/x86/kvm/x86_cpu.cc:1462:22: error: unused variable ‘rv’ [-Werror=unused-variable]
1462 | bool rv = isa->cpuid->doCpuid(tc, function, idx, cpuid);
| ^~
```
`rv` is unused in the .fast compilation as it's only used in the
`assert` statement immediately after.
To fix this, the `[[maybe_unused]]` annotation is used.
clang v8, when installed in this manner via Docker, did not install the
libstdc++. This caused compilation errors. This patch adds the
libstdc++-10-dev package to this Dockerfile.
Detected via this failing workload:
https://github.com/gem5/gem5/actions/runs/5861958237
It caused the following compilation error to be thrown:
```
src/arch/x86/kvm/x86_cpu.cc:1462:22: error: unused variable ‘rv’ [-Werror=unused-variable]
1462 | bool rv = isa->cpuid->doCpuid(tc, function, idx, cpuid);
| ^~
```
`rv` is unused in the .fast compilation as it's only used in the
`assert` statement immediately after.
To fix this, the `[[maybe_unused]]` annotation is used
Change-Id: Ib98dd859c62f171c8eeefae93502f92a8f133776
clang v8, when installed in this manner via Docker, did not install the
libstdc++. This caused compilation errors. This patch adds the
libstdc++-10-dev package to this Dockerfile.
Change-Id: Ia0f41e82b3df2d4bf32b418b0cb78111a35e0b9f
The previous exit event occurs when the dispatcher sends a completion
signal for a kernel, but gem5 does some kernel-based stats updates after
the signal is sent. Therefore, if these exit events are used as a way to
dump per-kernel stats, some of the stats for the kernel that just ended
will be in the next kernel's stat dump which is misleading.
This patch moves the exit event to where the stats are updated and only
exits if the dispatcher has requested a stat dump to prevent situations
where stats are updated mid-kernel.
Change-Id: I74dc1cad5fc90382a2a80564764b3e7c9fb65521
We're seeing some occasional connection timeouts in CI, possibly when we
aggressively hit the license server, so let's add a parameter to retry
the connection a few times.
Also, print the time required to connect to the server to help debug
issues.
When xbar encounters the address error, print out the port trace in the
packet for user to debug if the port trace is enabled.
To gain the packet of the access, the parameter of findPort() function
is changed from AddrRange to PacketPtr.
When running gem5 with "--debug-flags=PortTrace", we can see the full
path of the unexpected access when xbar cannot find the destination of
the address.
This patch changes the data type used for image size from int to
uint64_t. Current version allows initializing AbstractMemory types with
a maximum binary size of 2GiB. This will be limiting in many studies.
This patch changes the data type used for image size from int
to uint64_t. Current version allows initializing AbstractMemory
types with a maximum binary size of 2GiB. This will be limiting
in many studies.
Change-Id: Iea3bbd525d4a67aa7cf606f6311aef66c9b4a52c
The previous exit event occurs when the dispatcher sends a completion
signal for a kernel, but gem5 does some kernel-based stats updates after
the signal is sent. Therefore, if these exit events are used as a way to
dump per-kernel stats, some of the stats for the kernel that just ended
will be in the next kernel's stat dump which is misleading.
This patch moves the exit event to where the stats are updated and only
exits if the dispatcher has requested a stat dump to prevent situations
where stats are updated mid-kernel.
Change-Id: I74dc1cad5fc90382a2a80564764b3e7c9fb65521
We're seeing some occasional connection timeouts in CI, possibly
when we aggressively hit the license server, so let's add a
parameter to retry the connection a few times.
Also, print the time required to connect to the server to help
debug issues.
Change-Id: I804af28f79f893fcdca615d7bf82dd9b8686a74c
1. Add findPort(PacketPtr pkt) for getting the port trace from the Packet.
Keep the findPort(AddrRange addr_range) for recvMemBackdoorReq(...)
2. With the debug flag `PortTrace` enabled, user can see the full path of
the packet with the corresponding address when address error in xbar.
Change-Id: Iaf43ee2d7f8c46b9b84b2bc421a6bc3b02e01b3e
This sets continue-on-error to true on any scheduled test that uses a
matrix so we can have all sets of tests run regardless if one of them
fails or not.
- Updates the list of current gem5 Maintainers.
- Updates the MAINATAINERS.yaml comments to reflect gem5's contribution
policy and maintainer responsibilities since moving to GitHub.
- Updates the subsystem maintainers, based on maintainer's stated
preferences.
- Adds the optional 'expert' field to subsystems.
This sets continue-on-error to true on any scheduled test that
uses a matrix so we can have all sets of tests run regardless
if one of them fails or not.
Change-Id: I8f6137ebdf62a5cecd582387316c330c8a1401ca
In an SMT CPU, upon a squash, the mis-predicted(squashing) instructions
can still be executing at IEW and own phys registers. If these registers
are added back to the rename freelist on this Tick, the registers may be
renamed to be used by other SMT thread(s). This causes register
ownership hazards, which may eventually freeze the CPU. This problem
seems to date back to 2014
(https://www.mail-archive.com/gem5-users@gem5.org/msg10180.html).
This patch delays the freelist update to avoid the hazard.
I tested that this patch does not cause any performance impact for my
set of benchmarks on default non-SMT O3CPU.
In an SMT CPU, upon a squash, the phys regs used by
mispredicted instructions can still be owned by executing
instructions in IEW. If the regs are added back to freelist
on this tick, the reg may be renamed to be used by another
SMT thread. This causes reg ownership hazard, which may
eventually freeze the CPU.
This patch delays the freelist update to avoid the hazard.
Change-Id: I993b3c7d357269f01146db61fc8a7b83a989ea45
If status.vs == OFF, the RVV instruction should raise Illegal
Instruction according to RISC-V V spec. If RVV is not implemented, all
of the RVV instruction need to raise exception.
This moves the clean runner step in our yaml files to be at the
beginning of a job, so that if a runner goes down and is unable to clean
at the end, we can ensure that
subsequent jobs still run as expected.
Change-Id: Iba52694aefe03c550ad0bfdb5b5f938305273988
Added save and restore checkpoint tests for arm-hello, x86-hello, x86-fs, power-hello
Added mips and sparc test but mips does not support checkpoint and there is a bug in sparc.
Added test file to run the tests.
Change-Id: I2d3b96f95ee08aae921de9a885ac5be77d49f326
Due to a 60GB limit on the VMs the gem5 project's GitHub Actions
self-hosted Runners execute within, we cannot run tests which need to
download the gem5 Resource's PARSEC disk image (v1.0.0,
http://resources.gem5.org/resources/x86-parsec?version=1.0.0). This
image, 33GB, is too big and causes our runners to run out of disk space
and fail.
These changes can be reverted when we are able to increase the size of
our VMs.
The x86-parsec gem5 Resource (v1.0.0,
http://resources.gem5.org/resources/x86-parsec?version=1.0.0) is 33GB.
The gem5 GitHub Actions self-hosted runners do not have enough Disk
Space in the VMs they are run to download this. Ergo we skip it.
Change-Id: I290fe265f03ceca65b2bed87e9f4a4ad601e0fc1
This argument allows the passing of IDs of resources which should be
skipped for this check.
Note: A current limitation here is you cannot specify the version of a
resource. Passing the ID of a resource to this will skip the downloading
for all versions of that resource.
Change-Id: Ifdb7c2b71553126fd52a3d286897ed5dd8e98f7c
These tests are disabled due our GitHub Actions self-hosted Runners
having a 60GB of disk space. The PARSEC Disk Image Resource (v1.0.0,
http://resources.gem5.org/resources/x86-parsec?version=1.0.0) is 33GB
and is simply too big to download and unzip for these tests.
These tests can be reenabled when this issue is resolved.
Change-Id: I9a63aa1903cea3ce7942bdc85bcd0b24761d2f29
This moves the clean runner step in our yaml files to be at the
beginning of a job, so that if a runner goes down and is
unable to clean at the end, we can ensure that
subsequent jobs still run as expected.
Change-Id: Iba52694aefe03c550ad0bfdb5b5f938305273988
There was another missing file in the configs for the fs tests,
so this should allow the switcheroo tests to pass.
Change-Id: Ic4e26cceeb9209f176158b80eaaba88b47968c39
The dailies timed out as they were running the entire directory of tests
due to a wrong variable name being used. In addition, the names of tests
were adjusted to include the matrix type so the artifacts won't
overwrite each other
The configs directory for the fs tests was missing the
checkpoint.py file, causing some of the CI tests to fail.
Change-Id: Ifbd775ad658f96d06bea7bee554fe3bedcf5a5b5
* Solves issue #106 by updating the cpts with the necessary vector
registers.
Change-Id: Ifeda90e96097f0b0a65338c6b22a8258c932c585
util: clear vector_element field
Change-Id: I6c9ec4e71f66722b26de030fa139cd626bdb24dc
"tests/gem5/configs/download_check.py" is used by the
"test-resource-downloading" test (defined in
"tests/gem5/gem5-resources/test_download_resources.py" and ran as part
of the "very-long" suite).
Prior to this change "download_check.py" would download each resource,
check it's md5, then at the end of the script remove all the downloaded
resources. This is inefficient on disk space and was causing our
"very-long" suite of tests to require a machines with a lot of disk
space to run.
This change alters 'download_check.py" to remove each resource after the
md5 check. Thus, only one resource is ever downloaded and present at any
given time during the running of this script.
The dailies timed out as they were running the entire directory
of tests due to a wrong variable named being used. In addition,
the names of tests were adjusted to include the matrix type so
the artifacts won't overwrite each other
Change-Id: Iaa1be8e0cfcbf9d64f1a674590bfe2bf1f0dae90
Updates the directories in which tests are run in accordance
with the refactoring of the testing directory
Change-Id: I93f5c5b0236c5180da04deb425ec2ed6804fa003
This field was added to give gem5 community members a change to register
that they have expertise in a particular subsystem but do not much to
assign themselves the responsibilities of a subsystem maintainer.
Those who have registered interest on being an subsystem expert have
been added.
Change-Id: I8f532e381e8e42257b2a68ac48204131479d8cd0
This change incorporates changes to the set of maintainers and the
maintainers assigned to each subsystem based on individual maintainers'
preferences.
Change-Id: Ic2c39907763282e89936fa0d90e3c1a105a0d917
This comment is updated to reflect new gem5 policy and its move to
GitHub and a Pull Request contribution model.
Change-Id: Iec909ffa0cca254fdbe56ce3165cb948cdd0cbce
"tests/gem5/configs/download_check.py" is used by the
"test-resource-downloading" test (defined in
"tests/gem5/gem5-resources/test_download_resources.py" and ran as part
of the "very-long" suite).
Prior to this change "download_check.py" would download each resource,
check it's md5, then at the end of the script remove all the downloaded
resources. This is inefficient on disk space and was causing our
"very-long" suite of tests to require a machines with a lot of disk
space to run.
This change alters 'download_check.py" to remove each resource after the
md5 check. Thus, only one resource is ever downloaded and present at any
given time during the running of this script.
Change-Id: I38fce100ab09f66c256ccddbcb6f29763839ac40
This merges initial support for RVV. Currently, only the simple CPUs are supported.
The decoder stalls for every vsetvl instruction.
In the future, we will implement vsetvl as a control instruction as described in #144
This changeset reorganizes the testing directory within gem5,
removing the bigger config folders, then replacing them with
smaller configs folders within each directory containing only
the scripts necessary for that set of tests. It also changes
the locations of the config scripts used in each set of tests,
and updates the tests accordingly.
Change-Id: I38297d4496f72bd5cf7200471acd5c4d93002b27
This change adds READMEs to each directory within tests/gem5,
with a short description of the test, as well as how to run it.
Change-Id: I574ebcdc837848b52f21e8c0f8856ff09463284b
Since the O3 and Minor CPU models do not support RVV right now as the
implementation stalls the decode until vsetvl instructions are exectued,
this change calls `fatal` if RVV is not explicitly enabled.
It is possible to override this if you explicitly enable RVV in the
config file.
Change-Id: Ia801911141bb2fb2bedcff3e139bf41ba8936085
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
TODOs:
+ vcompress.vm
Change-Id: I86eceae66e90380416fd3be2c10ad616512b5eba
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
arch-riscv: Add LICENCE to template files
Change-Id: I825e72bffb84cce559d2e4c1fc2246c3b05a1243
* TODOs:
+ Vector Segment Load/Store
+ Vector Fault-only-first Load
Change-Id: I2815c76404e62babab7e9466e4ea33ea87e66e75
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
Change-Id: I84363164ca327151101e8a1c3d8441a66338c909
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
arch-riscv: Add a todo to fix vsetvl stall on decode
Change-Id: Iafb129648fba89009345f0c0ad3710f773379bf6
This commit add regs and configs for vector extension
* Add 32 vector arch regs as spec defined and 8 internal regs for
uop-based vector implementation.
* Add default vector configs(VLEN = 256, ELEN = 64). These cannot
be changed yet, since the vector implementation has only be tested
with such configs.
* Add disassamble register name v0~v31 and vtmp0~vtmp7.
* Add CSR registers defined in RISCV Vector Spec v1.0.
* Add vector bitfields.
* Add vector operand_types and operands.
Change-Id: I7bbab1ee9e0aa804d6f15ef7b77fac22d4f7212a
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
arch-riscv: enable rvv flags only for RV64
Change-Id: I6586e322dfd562b598f63a18964d17326c14d4cf
Added a parameter to kernel_disk_workload which allows users to change the disk device location. Maintained the previous way of setting a disk device as the default, however added a function to allow users to override this default
The MMIO trace contains register values for parts of the GPU that are
not modeled in gem5, such as registers related to the graphics core.
Since MI100 and MI200 do not have anything that is not modeled, the
MMIO trace is not needed, therefore it does not need to be used or
checked and the command line option goes away entirely for MI100/200.
Change-Id: I23839db32b1b072bd44c8c977899a99347fc9687
Starting with ROCm 5.4+, MI100 and MI200 make use of the translate
further bit in the page table. This bit enables mixing 4kiB and 2MiB
pages and is functionally equivalent to mixing page sizes using the
PDE.P bit for which gem5 currently has support.
With PDE.P bit set, we stop walking and the page size is equal to the
level in the page table we stopped at. For example, stopping at level
2 would be a 1GiB page, stopping at level 3 would be a 2MiB page.
This assumes most pages are 4kiB.
When the F bit is used, it is assumed most pages are 2MiB and we will
stop walking at the 3rd level of the page table unless the F bit is set.
When the F bit is set, the 2nd level PDE contains a block fragment size
representing the page size of the next PDE in the form of 2^(12+size).
If the next page has the F bit set we continue walking to the 4th level.
The block fragment size is hardcoded to 9 in the driver therefore we
assert that the block fragment size must be 0 or 9.
This enables MI200 with ROCm 5.4+ in gem5. This functionality was
determine by examining the driver source code in Linux and there is no
public documentation about this feature or why the change is made in or
around ROCm 5.4.
Change-Id: I603c0208cd9e821f7ad6eeb1d94ae15eaa146fb9
This SDMA packet is much more common starting around ROCm 5.4.
Previously this was mostly used to clear page tables after an
application ended and was therefore left unimplemented. It is
now used for basic operation like device memsets.
This patch implements constant fill as it is now necessary.
Change-Id: I9b2cf076ec17f5ed07c20bb820e7db0c082bbfbc
When using the new operator, delete should be called
on any allocated memory after it's use is complete.
Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.
Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a temporary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.
As a result of this, as more VOP2 instructions are implemented using
this helper, they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.
Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
When using the new operator, delete should be called
on any allocated memory after it's use is complete.
Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
In "src/cpu/testers/gpu_ruby_test" a random number generator was used.
This was using the CPP "<random>" library. This patch changes it to the
gem5 random class (that declared in "base/random.hh").
In addition to this, undeterministic behavior has been removed. Via
"protocol_tester.cc" the RNG is either seeded with a seed specified by
the user, or goes with the gem5 default seed. This ensures reproducable
runs. Prior to this patch the RNG was seeded with `time(NULL)`. This
made finding faults difficult.
This, at least partially, addresses Issue #138
Change-Id: Ia8e9f7b87e91323f828e0b7f6c3906c0c5793b2c
arch-x86: Move CPUID values to python
CPUID values for X86 are currently hard-coded in the C++ source file.
This makes it difficult to configure the bits if needed. Move these to
python instead. This will provide a few benefits:
1. We can enable features for certain configurations, for example AVX
can be enabled when the KVM CPU is used, but otherwise should not be
enabled as gem5 does not have full AVX support.
2. We can more accurately communicate things like cache/TLB sizes based
on the actual gem5 configuration. The CPUID values are can be used by
some libraries, e.g., MPI, to query system topology.
3. Enabling some bits breaks things in certain configurations and this
can be prevented by configuring in python. For example, enabling AVX
seems to currently be breaking SMP, meaning gem5 can only boot one CPU
in that configuration.
In "src/cpu/testers/gpu_ruby_test" a random number generator was used.
This was using the CPP "<random>" library. This patch changes it to the
gem5 random class (that declared in "base/random.hh").
In addition to this, undeterministic behavior has been removed. Via
"protocol_tester.cc" the RNG is either seeded with a seed specified by
the user, or goes with the gem5 default seed. This ensures reproducable
runs. Prior to this patch the RNG was seeded with `time(NULL)`. This
made finding faults difficult.
Change-Id: Ia8e9f7b87e91323f828e0b7f6c3906c0c5793b2c
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.
Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a tempoary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.
As a result of this, as more VOP2 instructions are implemented using
this helper,they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.
Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
This change syncs the repo's contributing documentation with that of the
website's contributing documentation:
https://www.gem5.org/contributing
From now on we'll attempt to keep the repo's CONTRIBUTING.md
documentation in sync with that on the website.
Change-Id: I2c91e6dd5cd7a9b642377878b007d7da3f0ee2ad
AVX is a requirement for some ROCm libraries, such as rocBLAS, which are
themselves requirements for libraries higher up the stack like PyTorch.
This patch sets the necessary CPUID bits in the GPUFS config to enable
AVX, AVX2, and various SSE features so that applications using these
libraries do not cause an illegal instruction trap.
Change-Id: Id22f543fb2a06b268271725a54075ee6a9a1f041
The extended state CPUID function is used to set the values of the XCR0
register as well as specify the size of storage for context switching
storage for x87 and AVX+. This function is iterative and therefore
requires (1) marking it as such in the hsaSignificantIndex function (2)
setting multiple sets of 4-tuples for the default CPUID values where the
last 4-tuple ends with all zeros.
Change-Id: Ib6a43925afb1cae75f61d8acff52a3cc26ce17c8
Related to the recent changes with moving CPUID values to python, this
value is needed to enable AVX and needs a way to be exposed to python as
well in order to set the bit and the corresponding CPUID values at the
same time.
Change-Id: I3cadb0fe61ff4ebf6de903018a8d8a411bfdb4e0
Various CPUID functions will return different values depending on the
value of ECX when executing the CPUID instruction. Add support for this
in the X86 KVM CPU. A subsequent patch will add a CPUID function which
requires iterating through multiple ECX values.
Change-Id: Ib44a52be52ea632d5e2cee3fb2ca390b60a7202a
CPUID values for X86 are currently hard-coded in the C++ source file.
This makes it difficult to configure the bits if needed. Move these to
python instead. This will provide a few benefits:
1. We can enable features for certain configurations, for example AVX
can be enabled when the KVM CPU is used, but otherwise should not be
enabled as gem5 does not have full AVX support.
2. We can more accurately communicate things like cache/TLB sizes based
on the actual gem5 configuration. The CPUID values are can be used by
some libraries, e.g., MPI, to query system topology.
3. Enabling some bits breaks things in certain configurations and this
can be prevented by configuring in python. For example, enabling AVX
seems to currently be breaking SMP, meaning gem5 can only boot one CPU
in that configuration.
Change-Id: Ib3866f39c86d61374b9451e60b119a3155575884
This change syncs the repo's contributing documentation with that of the
website's contributing documentation:
https://www.gem5.org/contributing
From now on we'll attempt to keep the repo's CONTRIBUTING.md
documentation in sync with that on the website.
Change-Id: I2c91e6dd5cd7a9b642377878b007d7da3f0ee2ad
The length of the path of #include pragmas can be more than
79-character long. The length of the path of a #include pragma
can be outside of user's control.
The refactoring to the daily tests was missing the dependency on the
'name-artifacts' job, which is necessary for downloading all the gem5
artifacts. This adds it in so the tests run as expected.
Change-Id: I0d71ab147395f41c881f2b24597bc07006e1f9c0
This fixes issue #131 by reverting to the old behavior of performing all
atomics at the system level. To do this the SLC bit needs to be set for
all atomic requests.
Change-Id: I63f4e449be1b02c933832d09700237f8c8026f4c
The refactoring to the daily tests was missing the dependency
on the 'name-artifacts' job, which is necessary for downloading
all the gem5 artifacts. This adds it in so the tests run as
expected.
Change-Id: I0d71ab147395f41c881f2b24597bc07006e1f9c0
This splits up the gem5 library example tests by Suite UID, as right now
running them together uses the runner for a long period of time. It is
important to note that doing this means additional tests from this
directory will need to be manually added, such as the kvm tests.
Change-Id: Ib2a0aca08f9b51b60e9dd0528324372cf2d98c05
The length of the path of the #include pragma can be more than
79-character long.
Change-Id: Id72250c166370c7f456bd1f7d05589a49c14c33d
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This fixes#131 by reverting to the old behavior of performing all
atomics at the system level. To do this the SLC bit needs to be set for
all atomic requests.
Change-Id: I63f4e449be1b02c933832d09700237f8c8026f4c
In the memory controller, MemCtrl::MemoryPort::recvFunctional, when the
functional request is satisfied by the ctrl-response queue, correctly
make the packet a response.
This change mirrors AbstractMemory::functionalAccess, which uses
Packet::makeResponse() after satisfying the request.
Note:
bool trySatisfyFunctional(..) functions return true or false based on
whether the request was satisfied.
void recvFunctional(..) functions modify the packet to indicate
successful request satisfaction.
This updates the TESTING.md to reflect the current state of the tests in
the gem5 repository and how they interact with the GitHub Actions
infrastructure.
These testing scripts are no longer used since moving to GitHub. The
Nightly (now refered to as "Daily" tests), the Weekly Tests, Compiler
Tests and the CI (Kokoro, pre-commit) tests are run via the GitHub
Actions infrastructure. Their setup is described via Workflow files in
".github/workflows". To run tests locally please consult the
"TESTING.md" file.
These scripts may still be useful to reference and are therefore being
moved into a deprecated state.
Change-Id: Ie75c2f4f1179eb73d0f45ba0b259e8d79aa02ace
This moves the gem5 library example tests into a separate matrix,
so they can run on separate runners
Change-Id: Ie9f51b5bae9e7e424d1c98b545b4cf92b481a2fb
This changes the daily tests to use a matrix in order to run
tests. It also includes forces the cleaning step to run
regardless of success or failure. With this refactoring, now
all builds of gem5 must finish before any tests run, and all
tests download all artifacts from all the build runs.
Change-Id: I16e1bc9acaf619feb85fba53eb6129e7df3fe409
These testing scripts are no longer used since moving to GitHub. The
Nightly (now refered to as "Daily" tests), the Weekly Tests, Compiler
Tests and the CI (Kokoro, pre-commit) tests are run via the GitHub
Actions infrastructure. Their setup is described via Workflow files
in ".github/workflows". To run tests locally please consult the
"TESTING.md" file.
These scripts may still be useful to reference and are therefore being
moved into a deprecated state.
Change-Id: Ie75c2f4f1179eb73d0f45ba0b259e8d79aa02ace
In the memory controller, MemCtrl::MemoryPort::recvFunctional,
when the functional request is satisfied by the ctrl-response queue,
correctly make the packet a response.
This change mirrors AbstractMemory::functionalAccess, which uses
Packet::makeResponse() after satisfying the request.
Change-Id: I47917062d3270915a97eed2c9fade66ba17019eb
This change:
1. Removes the 'Specifying a subset of tests to run' section. This
section is no longer useful since tests are no longer divided up so
neatly by tags as they once were.
2. Adds a section outlining the 'quick', 'long' and 'very-long' tests
and how they may be selected and run.
Change-Id: I61370dd80cc925a15d1a22755faa7d62e810862f
In https://gem5-review.googlesource.com/c/public/gem5/+/52047 inst.pc
was changed from an object to a pointer. It is possible that this
pointer is null (e.g., if there is an interrupt and there is a bubble).
Make sure to check that it's not null before printing.
I believe that other places this pointer is dereferenced without an
explicit null check are safe, but I'm not certain.
Should fix#97
Change-Id: Idbe246cfdb62d4d75416d41b451fb3c076233bbc
When compiling with clang-14 I received the following error:
```
src/base/bitfield.hh:328:1: error: function 'findLsbSetFallback' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
```
This function was introduced in PR #76.
This fixes this compiler warning/error by using `[[maybe_unused]]`.
Change-Id: I0b99eab0a9e42ee1687e7a0594a5a7bf9588b422
FEAT_TLBIOS has been introduced by a recent patch [1] which was however
missing to include the outer shareable case in the Msr disambiguation
switch. Which meant the TLBIOS instructions were decoded as normal MSR
instructions, with no effect whatsoever on the TLBs
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/70567
Change-Id: I41665a4634fbe0ee8cc30dbc5d88d63103082ae9
FEAT_TLBIOS has been introduced by a recent patch [1] which
was however missing to include the outer shareable case in the
Msr disambiguation switch. Which meant the TLBIOS instructions
were decoded as normal MSR instructions, with no effect whatsoever
on the TLBs
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/70567
Change-Id: I41665a4634fbe0ee8cc30dbc5d88d63103082ae9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Added a GLC atomic latency parameter (glc-atomic-latency) used when
enqueueing response messages regarding atomics directly performed in
the TCC. This latency is added in addition to the L2 response latency
(TCC_latency). This represents the latency of performing an atomic
within the L2.
With this change, the TCC response queue will receive enqueues with
varying latencies as GLC atomic responses will have this added GLC
atomic latency while data responses will not. To accommodate this in
light of the queue having strict FIFO ordering (which would be violated
here), this change also adds an optional parameter bypassStrictFIFO to
the SLICC enqueue function which allows overriding strict FIFO
requirements for individual messages on a case-by-case basis. This
parameter is only being used in the TCC's atomic response enqueue call.
Change-Id: Iabd52cbd2c0cc385c1fb3fe7bcd0cc64bdb40aac
Create a new workflow file that will hold jobs that are for managing the
repository, issues, prs, etc. This changeset then adds a job to close
issues that have been open for 30 days without a response someone marks
the issue as "needs details."
Change-Id: I23b9b6aa5fa67f205e116c88d5449cb69f53b6f9
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
* base: Enable stl_helpers::operator<< in _formatString
The string format (%s) eventually relies on bare operator<< to
display any type T. This gives the opportunity to use the helpers in
stl_helpers. This patch enables printing enums, pairs, tuples,
vectors, maps and others in a PRINTF debug macro without any extra
manual operation.
Change-Id: I8ac85133ebadcb95354598c1cfe687d8fffb89e2
* base: Add Printer util class to force use of operator<< helpers
Wrapping any value in a Printer instance before using operator<< will
force the use of stl_helpers::operator<<.
Change-Id: I7b505194eeabc3e0721effd9b5ce98f9e151b807
* base: Fix typo in ostream_helpers.hh
Change-Id: I283a5414f3add4f18649b77153dcbcc8661bc81e
* base: Disambiguate null optional representation in ostream helper
Change-Id: I5b093555688566cc405248d3a448a8f3efa67888
* base: Add unit test for std::optional ostream helper
Change-Id: I6fb9ced5e6461de5685638a162b5534e10710e20
* base: Ostream helpers Printer unit test
Change-Id: I11db89e85fd40c12bceecb41cadee78b8e871d7b
* base: Unit test for ostream helpers for pointers and smart ptr
Change-Id: Ifa87e8b69fdd9a4869250ab40311f352e8f54ed9
* base: Coding style fix in ostream_helpers.test.cc
Change-Id: I095c7048fad35e63f979aa601bfc8cde65c9077b
* base: Test shared_ptr in ostream_helpers.test.cc
Change-Id: I553df0614f1dd6eef2061c4dc1794af8c543b78f
---------
Co-authored-by: Gabriel Busnot <gabriel.busnot@arteris.com>
* stdlib,configs,tests: Remove `Resource` class use
This class is deprecated, but was still used in various example
configuration scriots and tests. This patch replaces it with the
`obtain_resource` function.
Change-Id: I0c89bf17783ccaaafc18072aaeefb5d1e207bc55
* configs: Remove `CustomDiskImageResource` use
The class is deprecated but was still used in the SPEC example scripts.
This patch replaces it with the `DiskImageResource` class.
Change-Id: Ie0697fe59a3d737b05eb45ff3bc964f42b0387e0
* configs,tests: Remove `CustomResource` use
This class is deprecated but was still used in example scripts and
mentioned, incorrectly, in comments in the pyunit tests. This patch
removes these.
Change-Id: Icb6d02f47a5b72cd58551e5dcd59cc72d6a91a01
* stdlib: Remove '\' in Workload docstring example
This example shows how to use the Workload. The backslash is not correct Python and would fail if used in this way.
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
---------
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
When compiling with clang-14 I received the following error:
```
src/base/bitfield.hh:328:1: error: function 'findLsbSetFallback' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
```
This function was introduced in PR #76.
This fixes this compiler warning/error by using `[[maybe_unused]]`.
Change-Id: I0b99eab0a9e42ee1687e7a0594a5a7bf9588b422
In https://gem5-review.googlesource.com/c/public/gem5/+/52047 inst.pc
was changed from an object to a pointer. It is possible that this
pointer is null (e.g., if there is an interrupt and there is a bubble).
Make sure to check that it's not null before printing.
I believe that other places this pointer is dereferenced without an
explicit null check are safe, but I'm not certain.
Change-Id: Idbe246cfdb62d4d75416d41b451fb3c076233bbc
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
* stdlib: Change resource compatibility warning
If the gem5 version is "develop", the warning will not
be thrown.
Change-Id: Id2be1c4323c6ca06c5503c2885c1608f8d119420
* stdlib: Change resource compatibility warning
If the gem5 version is "develop", the warning will not
be thrown.
Change-Id: Id2be1c4323c6ca06c5503c2885c1608f8d119420
* tests: Edit obtain_resources warning test
Since we are editing the warning message for
the develop branch, the test removes the
warning message as well.
Change-Id: I90882340188360bb3435344cdc14b324412c6c0e
---------
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
This splits up the gem5 library example tests by Suite UID, as
right now running them together uses the runner for a long
period of time. It is important to note that doing this means
additional tests from this directory will need to be
manually added, such as the kvm tests.
Change-Id: Ib2a0aca08f9b51b60e9dd0528324372cf2d98c05
* cpu-kvm: Add a variable signifying whether we are using perf
Change-Id: Iaa081e364f85c863f781723b5524d267724ed0e4
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* cpu-kvm: Making it clear the functionalities are specific to KVM
Change-Id: I982426f294d90655227dc15337bf73c42a260ded
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* cpu-kvm: Make perf optional
Change-Id: I8973c2a96575383976cea7ca3fda478f83e95c3f
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* configs: Add an example config of using KVM without perf
Change-Id: Ic69fa7dac4f1a2c8fe23712b0fa77b5b22c5f2df
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* Apply suggestions from code review
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
* misc: Add an example to the panic
Change-Id: Ic1fdfb955e5d8b9ad1d4f0a2bf30fa8050deba70
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* misc: Add warning of not using perf when using KVM CPU
Change-Id: I96c0832fb48c63a79773665ca6228da778ef0497
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* misc: Fix stuff
Change-Id: Ib407ae7407955b695f0e0f2718324f41bb0d768f
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
* misc: style fix
Change-Id: I7275942e43f46140fdd52c975f76abb3c81b8b0a
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
---------
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Added support for performing non-SLC-set atomics in the TCC.
Previously, all atomics were being passed on by the TCC to the
directory. With this change, atomics will only be passed on if the SLC
bit is set or if the line isn't present or available in the TCC.
If a non-SLC atomic is passed on to the directory because it is not
present in the TCC, the atomic will be performed on the return path on
the Data event. To accommodate the directory not performing the atomic
in this case, this change also passes the SLC bit on to the directory.
The previously-named "Atomic" action has been renamed to
"AtomicPassOn", with the new "Atomic" corresponding to an atomic
performed directly in the TCC.
Change-Id: Ibf92f71ddceb38bd1b0da70b0a786cc4c3cf2669
In the previous version of gem5, the source files of extra directories
will copy to build directory for compilation. It will not be a problem
if the extra directories include *.h(*.hh) from the other extra
directories.
After the patch applied from the change
(https://gem5-review.googlesource.com/c/public/gem5/+/68758). The
source files of extra directories will not copy to the build directory
unless the user compiles gem5 with "--duplicate-sources". It will
cause the compilation error if the code includes a header file from
other repositories.
For example, assume we want to compile gem5 with "foo/bar1" and
"foo/bar2" repositories and they are gem5-independent. There are some
header files in "foo/bar1/a.h" "foo/bar1/b.h" and "foo/bar2/d.h". If
the code "foo/bar1/sample.c" tries to include the file "foo/bar2/d.h".
They usually include the file by declare "#include bar2/d.h" in
foo/bar1/sample.c. It can work if --duplicate-sources is specified in
gem5 build because they will copy to <builddir>/bar1 and
<builddir>/bar2 respectively, and -I<builddir> is specified by default
whether duplicate_sources or not. It will raise the compilation error
if the user does not specify it.
The change is aimed to let the situation work without
duplicate-sources specified by adding parent extra directory, and
adding them before the extra directories. If the --duplicate-sources
specified, it will not add parent extra directories to avoid repeat
include paths.
Change-Id: I461e1dcb8266d785f1f38eeff77f9d515d47c03d
* base: Fix Memoizer constructor parameter type
* base: switch from new to mk_unq in amo.test.cc
* base: Fix memory management in IniFile
* base: Fix memory management in Trie
* sim: Fix out-of-bounds access in CheckpointIn::setDir
Change-Id: Iac50bbf01b6d7acc458c786da8ac371582a4ce09
---------
Co-authored-by: Gabriel Busnot <gabriel.busnot@arteris.com>
Added dummy definition of __has_builtin to bitfield.hh's hasBuiltinCtz,
which is already being done in popCount.
Change-Id: I4a1760a142209462bb807c6df4bc868284b6f5f3
* tests,util-docker,misc: Drop compiler support for GCC 7
Change-Id: I8b17b77c92b88e78a8cb6d38cd5f045dbe80a643
* tests,util-docker,misc: Drop compiler support for clang 6.0
Change-Id: Ie3b6bfe889ad1d119cee0c9ffb04c5996517922e
* util-docker,tests,misc: Remove Ubuntu 18.04 support
18.04 is no longer supported. This patch removes specific 18.04 compiler
tests and removes our 18.04 dockerfiles. Images will no longer be
produced for specific 18.04 tasks.
Compiler images for GCC and Clang, which used 18.04 have been updated to
use 20.04.
Change-Id: I6338ab47af3287a25a557dbbeaeebcfccfdec9fc
* misc: Add Bug Report Issue Template
Change-Id: I3acf7a1991f889462c0f2604d251dead563846c2
* misc: Cleanup bug_report.md
* Inform the reader to use codeblocks where approproate.
* Inform the reader they should include the Python configuraiton
script and state parameters passed.
Change-Id: Ib0b8e9a6d3ed199c435917acfdf958073d4faa04
* misc: Update CI test workflow
This updates our CI tests to clean the runners after every
workflow, to make sure no hanging files cause problems for
future tests
Change-Id: Iff6a702bbc2e86a31e4c18ef9764a3cfd3af2f7d
* misc: Update scheduled workflows to clean runners
This updates our scheduled tests to clean up any remaining
files after running tests to avoid anything hanging for
future runs.
Change-Id: Icfdd5a0559337ad0e62d108a47f4e5a12e0db677
* misc: Fix spacing in workflow files
Some commands were incorrectly spaced
Change-Id: Id340dc77bfb5c5d579b5f1e5b3ddeabea4a35ea8
* base: Generalize findLsbSet to std::bitset<N>
* base: Split builtin and fallback implementations of findLsbSet
* base: Add more unit testing for findLsbSet
Change-Id: Id75dfb7d306c9a8228fa893798b1b867137465a9
---------
Co-authored-by: Gabriel Busnot <gabriel.busnot@arteris.com>
* misc: Update README to README.md
This change converts the text-based README to markdown. This works
better with modern source-control systems, most notably, GitHub.
The README.md has been broken down into sections to better organize the
document.
This section now included expanded information on Reporting bugs and
Requesting Features.
Due to renaming 'README' to 'README.md', this code was generating the
following for "info.py":
```
README.md = "<FILE CONTENTS HERE>"
```
As '.' is used to access member variables/methods in python. To fix this
"infopy.oy" now replaces "." with "_". As such the generated in in
"info.py" is now:
```
README_MD = "<FILE CONTENTS HERE>"
This puts GitHub Discussions and GitHub Issues towards the top of the
list. This is to incentivize their usage.
Change-Id: I18018ba23493f43861544497f23ec59f1e8debe1
---------
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Latest protobuf library depends on abseil libraries. We should rely on
pkgconfig to give us correct dependency. We still keep the old check as
fallback.
Change-Id: I529ea1f61e5bbc16b2520ab1badff3d8264f1c33
AMD GCN3 and Vega GPUs assume a max of 16 WG/CU. Any GPU WG with more
than 1 WF requires a hardware barrier to allow WFs in the WG to
synchronize locally. However, currently the default gem5 GPU
configuration assumes only 4 barriers per CU, which artificially
prevents applications with > 4 WG/CU that could run simultaneously
from running simultaneously.
This fix resolves this by updating the default number of hardware barriers
per CU to 16, which mimics the support described in slide 39 here:
https://www.olcf.ornl.gov/wp-content/uploads/2019/10/
ORNL_Application_Readiness_Workshop-AMD_GPU_Basics.pdf
Change-Id: Ib7636a13359d998e676c1790f436a83ce88cbfc0
This change adds a new file to m5out which is citations.bib.
This file will contain the citations to the papers which describe the
aspects of the gem5 simulator that the simulation uses. In other words,
each simulation configuration could generate a different bib file
referencing different works.
Each SimObject can now have a set of citations associated with it. After
the system is built (in `instantiate`), the citations.bib file is
created by parsing all SimObjects that have been instantiated and taking
the union of their associated citations.
This commit is not meant to add all citations, but to act as an example
for others to add more citations to gem5.
Change-Id: Icd5c46fd9ee44adbeec1fea162657f5716f7e5ef
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Added WIB (Waiting on Writethrough Ack; Will be Bypassed) state which
is transitioned to when a dirty line in the TCC is evicted in a
bypassed read. Previously, we were transitioning to invalid.
While a WI (Waiting on Writethrough Ack) state exists, transitions from
it on WBAck deallocates the TBE, which contains SLC bit information
needed to trigger the Bypass event when the read response from the
directory comes in.
Without this change, WB acknowledgements from the directory in read
bypass evicts (with the SLC bit set) were being treated as if they were
read responses, leading to an invalid transition panic.
Change-Id: I703c3fe8af0366856552bb677810cb1a8f2896de
This patch changes the way memory ranges are devided when using
multiple cores for linear traffic. The current state assigns the
same range to multiple linear generators so all the cores start
generating the same trace. This patch devides the overall range
assigned to the generator ([min_addr:max_addr]) between the cores.
Change-Id: I49f69b3d61b590899f8d54ee3be997ad22d7fa9b
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: mkjost0 <50555529+mkjost0@users.noreply.github.com>
Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
* tests: Remove large files from resource specialization tests
These tests were downloading resources (but not actually using them) to
ensure the `obtain_resources` function returned the correct
specialization and was parsing the data correctly. As these resources
were never used, this patch removes the downloading of large files in
this case, replacing them with smaller binaries.
Change-Id: I7b33aa6be8ec65b296b470cd50b128c084f2b71f
* tests: Rename 'looppoint-json...' example specailization
Appending '-example' to the end avoids any name clashes.
'looppoint-json-restore-resources-region-1' shares this ID with a real
resources in gem5 Resources.
Change-Id: I9853e97cb71e768c46ad173b5a497609f4acc3b2
* tests: Remove disk image download from Workload Checks
This download is big and unecessary (the workload is never run as part
of the test). This patch changes this test to instead download a small
binary in it's palce (again, this does not matter as this is never
actually run as a disk image).
Change-Id: I74034ebcf5f2501917847c258570e88a8f653a5d
* tests: Update IDs in Pyunit Workload checks
Some of these IDs clash with real workloads/resources in gem5 Resources.
To avoid any possible clashes or confusions, all the mock
resources/workloads in this suite of tests has been renamed with
'-example' appending on the end of the ID.
Change-Id: Ifd907b2321416bf05e8c4e646024d179da2ca487
When shiftAmt is 0 for a UQRSHL instruction, the code called bits() with
incorrect arguments. This fixes a left-shift of 0 to be a NOP/mov, as
required.
Change-Id: Ic86ca40ac42bfb767a09e8c65a53cec56382a008
Co-authored-by: Marton Erdos <marton.erdos@arm.com>
* gpu-compute: Remove use of 'std::random_shuffle'
This was deprecated in C++14 and removed in C++17. This has been
replaced with std::random. This has been implemented to ensure
reproducible results despite (pseudo)random behavior.
Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116
* dev-amdgpu: Add missing 'overrides'
This causes warnings/errors in some compilers.
Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae
* dev: Fix Linux specific includes to be portable
This allows for compilation in non-linux systems (e.g., Mac OS).
Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2
* gpu-compute: Add missing include in dispatcher.cc
Due to some cherry-picking onto the release-staging branch, there was a
missing "sim/sim_exit.hh" include in "src/gpu-compute/dispatcher.cc".
This was causing compilation errors.
This is being added to the v23.0.0 release as a hotfix.
Change-Id: I1043ecf5c41ad6afc0e91311b196f4801646002f
Issue-on: https://gem5.atlassian.net/browse/GEM5-1332
* misc: Update version to v23.0.0.1
Change-Id: I3bbcfd4dd9798149b37d4a2824fe63652e29786c
* misc: Update RELEASE-NOTES.md for v23.0.0.1 hotfix
Change-Id: Ieced7f693a8cbef586324dfe7ce826da16d9a3c3
* scons: Fix sanitizer lib link for clang
Change-Id: I2441466c5c9343afd938185b8ec5047d4e95ac70
* scons: Statically link libubsan when using sanitizers with gcc
Change-Id: I362a1fb87771454ad94e439847a85d19108f375a
---------
Co-authored-by: Gabriel Busnot <gabriel.busnot@arteris.com>
* gpu-compute: Remove use of 'std::random_shuffle'
This was deprecated in C++14 and removed in C++17. This has been
replaced with std::random. This has been implemented to ensure
reproducible results despite (pseudo)random behavior.
Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116
* dev-amdgpu: Add missing 'overrides'
This causes warnings/errors in some compilers.
Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae
* dev: Fix Linux specific includes to be portable
This allows for compilation in non-linux systems (e.g., Mac OS).
Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2
* tests: Add 'VEGA_X86' build target to compiler-tests.sh
Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5
* tests: Add 'GCN3_X86' build target to compiler-tests.sh
Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123
This operator can be safely brought in scope when needed with "using
stl_helpers::operator<<".
In order to provide a specialization for operator<< with
stl_helpers-enabled types without loosing the hability to use it with
other types, a dual-dispatch mechanism is used. The only entry point
in the system is through a primary dispatch function that won't
resolve for non-helped types. Then, recursive calls go through the
secondary dispatch interface that sort between helped and non-helped
types. Helped typed will enter the system back through the primary
dispatch interface while other types will look for operator<< through
regular lookup, especially ADL.
Change-Id: I1609dd6e85e25764f393458d736ec228e025da32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67666
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Currently, gem5 suffers from several bugs related
to Python interpreter's locale encoding issues.
gem5 will crash when the working directory contains
Non-ASCII characters.
The reason is that Python 3.8+ introduces a new
interpreter startup sequence [1]. The startup
sequence consists of three phases:
1. Python core runtime preinitialization
2. Python core runtime initialization
3. Main interpreter configuration
Stage 1 determining the encodings used for system
interfaces.
However, gem5 doesn't preinitialize the Python
interpreter. Thus, the locale settings do not take
effect. This patch preinitialize the Python for
Python 3.8+.
Also, this patch avoid the use of `Py_SetProgramName`,
which is deprecated since Python 3.11[3].
[1] https://peps.python.org/pep-0432/
[2] https://peps.python.org/pep-0587/
[3] https://docs.python.org/3/c-api/init.html#c.Py_SetProgramName
Change-Id: I08a2ec6ab2b39a95ab194909932c8fc578c745ce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70898
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Roger Chang <rogerycchang@google.com>
Since commits will be squashed and merged in GitHub, we only
require one of the commits to contain a Change-ID within a
pull request
Change-Id: I0fbb1c0e79009097456193fbe3c6fa20746e4805
This updates the change-id code to refer to commit messages in
pull requests instead of on pushes.
Change-Id: I308f02b4616804b386140d5875a79878eccd721e
A GUI web-based tool to manage gem5 Resources.
Can manage in two data sources,
a MongoDB database or a JSON file.
The JSON file can be both local or remote.
JSON files are written to a temporary file before
writing to the local file.
The Manager supports the following functions
on a high-level:
- searching for a resource by ID
- navigating to a resource version
- adding a new resource
- adding a new version to a resource
- editing any information within a searched resource
(while enforcing the gem5 Resources schema
found at: https://resources.gem5.org/gem5-resources-schema.json)
- deleting a resource version
- undo and redo up to the last 10 operations
The Manager also allows a user to save a session
through localStorage and re-access it through a password securely.
This patch also provides a
Command Line Interface tool mainly for
MongoDB-related functions.
This CLI tool can currently:
- backup a MongoDB collection to a JSON file
- restore a JSON file to a MongoDB collection
- search for a resource through its ID and
view its JSON object
- make a JSON file that is compliant with the
gem5 Resources Schema
Co-authored-by: Parth Shah <helloparthshah@gmail.com>
Co-authored-by: Harshil2107 <harshilp2107@gmail.com>
Co-authored-by: aarsli <arsli@ucdavis.edu>
Change-Id: I8107f609c869300b5323d4942971a7ce7c28d6b5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71218
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
The unconditional exit event when a kernel completes that was added in
c644eae2dd is causing scripts that do not
ignore unknown exit events to end simulation prematurely. One such
script is the apu_se.py script used in SE mode GPU simulation. Make this
exit conditional to the parameter being set to a valid value to avoid
this problem.
Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72098
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Also increase maximum enum supported value to 0x100
This library is courtesy of Daniil Goncharov and hosted at
https://github.com/Neargye/magic_enum/tree/master. It enables
compile-time and runtime string to/from enum value conversion as well
as other fancy enum introspection-like things.
The MIT lincence should not conflict with gem5 in any way.
Copying the file for now but a submodule might be more suitable if
gem5 uses them in the future.
Change-Id: Ib0c8f943b79c703f1247c11c7291fb4fb1548b0f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67665
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
These types include std::pair, std::tuple, all iterable types and any
composition of these. Convenience hash factory and computation
functions are also provided.
These functions are in the stl_helpers namespace and must not move to
::std which could cause undefined behaviour. This is because
specialization of std templates for std or native types (or
composition of these) is undefined behaviour. This inconvenience can't
be circumvented for generic code. Users are free to bring these hash
implementations to namespace std after specialization for their own
non-std and non-native types.
Change-Id: Ifd0f0b64e5421d5d44890eb25428cc9c53484eb3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67663
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
Prior to this patch, when a memory controller was failing at sending a
response to AbstractController, it would not wakeup until the next
request. This patch gives the opportunity to Ruby models to notify
memory response buffer dequeue so that AbstractController can send a
retry request if necessary.
A dequeueMemRspQueue function has been added AbstractController to
automate the dequeue+notify operation.
Note that models that don't notify AbstractController will continue
working as before.
Change-Id: I261bb4593c126208c98825e54f538638d818d16b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67658
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
When you build Python from scratch, the modules would be separated
shared libraries. They would be dlopen when doing module import. To make
the separated shared libraries can share the symbol in the binary, we
should add -rdynamic when compliing.
Change-Id: I26bf9fd7ea5068fd2d08c8f059b37ff34073e8c2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72040
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Different GPUFS disk images have different root partitions that Linux
needs to boot from. In particular, Ubuntu's new installer has a GRUB
partition that cannot seem to be removed. Adding this as an option
prevents needing to edit a config script to change one character each
time a different disk image is used.
Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71918
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Vega adds multiple new D16 instructions which load a byte or short into
the lower or upper 16 bits of a register for packed math. The decoder
table has subDecode tables for FLAT instructions which represents 32
opcodes in each subDecode table. The subDecode table for opcodes 32-63
is missing so it is added here.
The opcode for V_SWAP_B32 is also off by one- In the ISA manual this
instruction is opcode 81, the instruction before is 79, and there is no
opcode 80, so the decoder entry is swapped with the invalid decoding
below it.
Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71899
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.
The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and causes GPUFS to break with that CPU.
This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.
Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
TracingExtension contains a stack recording the port names
passed through of the Packet. The target receiving the Packet
can dump out the whole path of this Packet for the debug purpose.
This mechanism can be enabled with the debug flag PortTrace.
Change-Id: Ic11e708b35fdddc4f4b786d91b35fd4def08948c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71538
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
This patch includes several changes to the gem5 tools interface to the
gem5-resources infrastructure. These are:
* The old download and JSON query functions have been removed from the
downloader module. These functions were used for directly downloading
and inspecting the resource JSON file, hosted at
https://resources.gem5.org/resources. This information is now obtained
via `gem5.client`. If a resources JSON file is specified as a client,
it should conform to the new schema:
https//resources.gem5.org/gem5-resources-schema.json. The old schema
(pre-v23.0) is no longer valid. Tests have been updated to reflect
this change. Those which tested these old functions have been removed.
* Unused imports have been removed.
* For the resource query functions, and those tasked with obtaining the
resources, the parameter `gem5_version` has been added. In all cases
it does the same thing:
* It will filter results based on compatibility to the
`gem5_version` specified. If no resources are compatible the
latest version of that resource is chosen (though a warning is
thrown).
* By default it is set to the current gem5 version.
* It is optional. If `None`, this filtering functionality is not
carried out.
* Tests have been updated to fix the version to “develop” so the
they do not break between versions.
* The `gem5_version` parameters will filter using a logic which will
base compatibility on the specificity of the gem5-version specified in
a resource’s data. If a resource has a compatible gem5-version of
“v18.4” it will be compatible with any minor/hotfix version within the
v18.4 release (this can be seen as matching on “v18.4.*.*”.) Likewise,
if a resource has a compatible gem5-version of “v18.4.1” then it’s
only compatible with the v18.4.1 release but any of it’s hot fix
releases (“v18.4.1.*”).
* The ‘list_resources’ function has been updated to use the
“gem5.client” APIs to get resource information from the clients
(MongoDB or a JSON file). This has been designed to remain backwards
compatible to as much as is possible, though, due to schema changes,
the function does search across all versions of gem5.
* `get_resources` function was added to the `AbstractClient`. This is a
more general function than `get_resource_by_id`. It was
primarily created to handle the `list_resources` update but is a
useful update to the API. The `get_resource_by_id` function has been
altered to function as a wrapped to the `get_resources` function.
* Removed “GEM5_RESOURCE_JSON” code has been removed. This is no longer
used.
* Tests have been cleaned up a little bit to be easier to read.
* Some docstrings have been updated.
Things that are left TODO with this code:
* The client_wrapper/client/abstract_client abstractions are rather
pointless. In particular the client_wrapper and client classes could
be merged.
* The downloader module no longer does much and should have its
functions merged into other modules.
* With the addition of the `get_resources` function, much of the code in
the `AbstractClient` could be simplified.
Change-Id: I0ce48e88b93a2b9db53d4749861fa0b5f9472053
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71506
Reviewed-by: Kunal Pai <kunpai@ucdavis.edu>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from commit 82587ce71b)
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71739
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
* According to the manual, load reservations must be cleared on a
failed or a successful SC attempt.
* A load reservation can be arbitrarily large. The current
implementation was reserving something different than cacheBlockSize
which could lead to problems if snoop addresses are cache block
aligned. This patch implementation assumes a cacheBlock granularity.
* Load reservations should also be cleared on faults
Change-Id: I64513534710b5f269260fcb204f717801913e2f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71558
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Roger Chang <rogerycchang@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The cache is modeled after an AMD EPYC cache, but not exactly
like AMD EPYC cache.
- K cores per core complex (CCD), each core has one private split L1,
and one private L2.
- K cores in the same CCD share 1 slice of L3 cache, which is not
a victim cache.
- There can be multiple CCDs, which communicate with each other via
Cross-CCD router. The Cross-CCD rounter is also connected to
directory controllers and dma controllers.
- All links latency are set to 1.
Change-Id: Ib64248bed9155b8e48e5158ffdeebf1f2d770754
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71598
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Many of the outstanding issues with the GPU model are related to
instructions not having SDWA/DPP implementations and executing by
ignoring the special registers leading to incorrect executiong.
Adding SDWA/DPP is current very cumbersome as there is a lot of
boilerplate code.
This changeset adds helper methods for VOP2 with one instruction
changed as an example. This review is intended to get feedback
before applying this change to all VOP2 instructions that support
SDWA/DPP.
Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The uint_fast16_t is the integer at least 16 bits size, it can be
32, 64 bits and more. Usually most of the simulations are in the
x86-64 linux host, the size of uint_fast16_t is 64 bits. Therefore,
there is no problem for double precision float operations and it can
pass FloatMM test. However, in the Mac OS, the size of uint_fast16_t
is 16 bits, it will lose the upper bits when converting float
register bits to freg_t and it will generate unexpected results for
FloatMM test.
The change can guarantee that the size of data in freg_t is at least
64 bits and it will not lose any data from floating point to freg_t.
Reference:
https://developer.apple.com/documentation/kernel/uint_fast16_thttps://codebrowser.dev/glibc/glibc/stdlib/stdint.h.html
Change-Id: I3df6610f0903cdee0f56584d6cbdb51ac26c86c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71578
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently fmax and fmin instructions convert source float registers such as
Fs1_bits to float64_t(or float32_t and float16_t) many times in the single
instruction. It is not efficient for the future maintenance of these
instructions.
The change adds non-register float_t intermediate variables fs1 and fs2 to
keep converted results so that we don’t need to do it repeatedly. It also
added an intermediate variable fd for specific float type to assume the upper
bits of the packed float register are all one.
Change-Id: Ic508d5255db6c4b38ca4df6dd805df440c043fff
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71479
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add two kernel dispatch-based exit events that are useful for limiting
the simulation and enabling debug flags at specific GPU kernels. Since
the KVM CPU typically used with GPUFS is not deterministic, this help
with enabling debug flags when the Tick number may vary. The exit at GPU
kernel option can also limit simulation by only simulating a few hundred
kernels, for example, and exit at a determined point.
Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71418
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
This patch makes changes to the stdlib based on the gem5 Vision project.
Firstly, a MongoDB database is supported.
A JSON database's support is continued.
The JSON can either be a local path or a raw GitHub link.
The data for these databases is stored in src/python
under "gem5-config.json".
This will be used by default.
However, the configuration can be overridden:
- by providing a path using the GEM5_CONFIG env variable.
- by placing a gem5-config.json file in the current working directory.
An AbstractClient is an abstract class that implements
searching and sorting relevant to the databases.
Clients is an optional list that can be passed
while defining any Resource class and obtain_resource.
These databases can be defined in the config JSON.
Resources now have versions. This allows for a
single version, e.g., 'x86-ubuntu-boot', to have
multiple versions. As such, the key of a resource is
its ID and Version (e.g., 'x86-ubuntu-boot/v2.1.0').
Different versions of a resource might be compatible
with different versions of gem5.
By default, it picks the latest version compatible with the gem5 Version
of the user.
A gem5 resource schema now has additional fields.
These are:
- source_url: Stores URL of GitHub Source of the resource.
- license: License information of the resource.
- tags: Words to identify a resource better, like hello for hello-world
- example_usage: How to use the resource in a simulation.
- gem5_versions: List of gem5 versions that resource is compatible with.
- resource_version: The version of the resource itself.
- size: The download size of the resource, if it exists.
- code_examples: List of objects.
These objects contain the path to where a resource is
used in gem5 example config scripts,
and if the resource itself is used in tests or not.
- category: Category of the resource, as defined by classes in
src/python/gem5/resources/resource.py.
Some fields have been renamed:
- "name" is changed to "id"
- "documentation" is changed to "description"
Besides these, the schema also supports resource specialization.
It adds fields relevant to a specific resource as specified in
src/python/gem5/resources/resource.py
These changes have been made to better present
information on the new gem5 Resources website.
But, they do not affect the way resources are used by a gem5 user.
This patch is also backwards compatible.
Existing code doesn't break with this new infrastructure.
Also, refs in the tests have been changed to match this new schema.
Tests have been changed to work with the two clients.
Change-Id: Ia9bf47f7900763827fd5e873bcd663cc3ecdba40
Co-authored-by: Kunal Pai <kunpai@ucdavis.edu>
Co-authored-by: Parth Shah <helloparthshah@gmail.com>
Co-authored-by: Harshil Patel <harshilp2107@gmail.com>
Co-authored-by: aarsli <arsli@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70858
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
According to the GIC specification (IHI0069) reserved addresses in the
GIC memory map are treated as RES0. We allow to disable this behaviour
and panic instead (reserved_res0 = False, which is what we have been
doing so far) to catch development bugs (in gem5 and in the guest SW)
Change-Id: I23f98519c2f256c092a52425735b8792bae7a2c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71138
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There are three bugs fixed in this patch:
1. The `dram_3_dir` was missing the "dramsim3" directory.
2. Missing `not` when checking if configs is a directory.
3. Missing `not` when checking if input file is a file.
Change-Id: I185f4832c1c2f1ecc4e138c148ad7969ef9b6fd4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71038
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
We have some customized protocols in gem5 repository and they require
the include path from src directory. It causes the users of those
protocols need to handle the include path correctly by theirselve. This
is tedious and unstable. We should add the default include path in
SIMGEN command line to prevent issues.
Change-Id: I2a3748646567635d131a8fb4099e02e332691e97
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71118
Reviewed-by: Wei-Han Chen <weihanchen@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
2023-05-31 23:47:30 +00:00
2389 changed files with 208433 additions and 174046 deletions
# ensures we have a change-id in every commit, needed for gerrit
check-for-change-id:
# runs on github hosted runner
runs-on:ubuntu-22.04
if:github.event.pull_request.draft == false
get-date:
# We use the date to label caches. A cache is a a "hit" if the date is the
# request binary and date are the same as what is stored in the cache.
# This essentially means the first job to run on a given day for a given
# binary will always be a "miss" and will have to build the binary then
# upload it as that day's binary to upload. While this isn't the most
# efficient way to do this, the alternative was to run take a hash of the
# `src` directory contents and use it as a hash. We found there to be bugs
# with the hash function where this task would timeout. This approach is
# simple, works, and still provides some level of caching.
runs-on:ubuntu-latest
outputs:
date:${{ steps.date.outputs.date }}
steps:
- uses:actions/checkout@v3
with:
fetch-depth:0
- name:Check for Change-Id
run:|
# loop through all the commits in the pull request
for commit in $(git rev-list ${{ github.event.pull_request.base.sha }}..${{ github.event.pull_request.head.sha }}); do
git checkout $commit
if (git log -1 --pretty=format:"%B" | grep -q "Change-Id: ")
then
# passes as long as at least one change-id exists in the pull request
exit 0
fi
done
# if we reach this part, none of the commits had a change-id
echo "None of the commits in this pull request contains a Change-ID, which we require for any changes made to gem5. "\
"To automatically insert one, run the following:\n f=`git rev-parse --git-dir`/hooks/commit-msg ; mkdir -p $(dirname $f) ; "\
"curl -Lo $f https://gerrit-review.googlesource.com/tools/hooks/commit-msg ; chmod +x $f\n Then amend the commit with git commit --amend --no-edit, and update your pull request."
# Scheduled workflows run on the default branch by default. We
# therefore need to explicitly checkout the develop branch.
ref:develop
- name:Compile build/GCN3_X86/gem5.opt
run:scons build/GCN3_X86/gem5.opt -j $(nproc)
- name:Get Square test-prog from gem5-resources
uses:wei/wget@v1
with:
args: -q http://dist.gem5.org/dist/develop/test-progs/square/square # Removed -N bc it wasn't available within actions, should be okay bc workspace is clean every time:https://github.com/coder/sshcode/issues/102
- name:Run Square test with GCN3_X86/gem5.opt (SE mode)
- name:Get allSyncPrims-1kernel from gem5-resources
uses:wei/wget@v1
with:
args:-q http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel# Removed -N bc it wasn't available within actions, should be okay bc workspace is clean every time
- name:Run allSyncPrims-1kernel sleepMutex test with GCN3_X86/gem5.opt (SE mode)
@@ -370,7 +403,6 @@ Steve Raasch <sraasch@umich.edu>
Steve Reinhardt <stever@gmail.com> Steve Reinhardt ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E%2C%20Ali%20Saidi%20%3CAli.Saidi%40ARM.com%3E) <stever@gmail.com>
Steve Reinhardt <stever@gmail.com> Steve Reinhardt <stever@eecs.umich.edu>
Steve Reinhardt <stever@gmail.com> Steve Reinhardt <steve.reinhardt@amd.com>
Steve Reinhardt <stever@gmail.com> Steve Reinhardt <Steve.Reinhardt@amd.com>
Stian Hvatum <stian@dream-web.no>
Sudhanshu Jha <sudhanshu.jha@arm.com>
Sujay Phadke <electronicsguy123@gmail.com>
@@ -378,16 +410,18 @@ Sungkeun Kim <ksungkeun84@tamu.edu>
**[HOTFIX]** Adds PR <https://github.com/gem5/gem5/pull/1930> as a hotfix to v24.1.0.
This fixes a bug which was was causing the CHI coherence protocol to fail in multi-core simulations.
The fix sets the `RubySystem` pointer when the TBE is allocated, instead of when `set_tbe` is performed, thus ensuring that the `RubySystem` pointer is set before the TBE is used.
# Version 24.1.0.1
**[HOTFIX]** This hotfix release applies the following:
* Generalization of the class types in CHI RNF/MN generators thus fixing an issue with missing attributes when using the CHI protocol.
PR: <https://github.com/gem5/gem5/pull/1851>.
* Add Sphinx documentation for the gem5 standard library.
This is largely generated from Python docstrings.
See "docs/README" for more information on building and deploying Sphinx documentation.
PR: <https://github.com/gem5/gem5/pull/335>.
* Add missing `RubySystem` member and related methods in `PerfectCacheMemory`'s entries.
This was causing assertions to trigger in "src/mem/ruby/commonNetDest.cc".
PR: <https://github.com/gem5/gem5/pull/1864>.
* Add `useSecondaryLoadLinked` function to "src/mem/ruby/slicc_interface/ProtocolInfo.hh".
This fixes a bug which was introduced after the removal of the `PROTOCOL_MESI_Two_Level` and `PROTOCOL_MESI_Three_Level` MACROs in v24.1.0.0.
These MACROs were being used to infer if `Load_Linked` requests are sent to the Ruby protocol or not.
The `useSecondaryLoadLinked` function has been introduced to specify this directly where needed.
PR: <https://github.com/gem5/gem5/pull/1865>.
# Version 24.1
## User facing changes
* The [behavior of the statistics `simInsts` and `simOps` has been changed](https://github.com/gem5/gem5/pull/1615).
* They now reset to zero when m5.stats.reset() is called.
* Previously, they incorrectly did not reset and would increase monotonically throughout the simulation.
* The statistics `hostInstRate` and `hostOpRate` are also affected by this change, as they are calculated using simInsts and simOps respectively.
* Instances of kB, MB, and GB have been changed to KiB, MiB, and GiB for memory and cache sizes #1479
* A warning has also been added for usages of kB, MB, and GB.
* Please use KiB, MiB, and GiB in the future.
* Random number generator is no longer shared across components. This may modify simulation results. #1534
### gem5 Standard Library
* SE mode has been added to X86Board, X86DemoBoard, and RiscvBoard #1702
* ArmDemoBoard and RiscvDemoBoard have been added to the standard library #1478#1490
* The values in the X86DemoBoard have been modified to make it more similar to the other DemoBoards #1618
### Prefetchers
* The [behavior of the`StridePrefetcher` has been altered](https://github.com/gem5/gem5/pull/1449) as follows:
* The addresses used to compute the stride has been changed from word aligned addresses to cache line aligned addresses.
* It returns if the stride does not match, as opposed to issuing prefetching using the new stride --- the previous, incorrect behavior.
* Returns if the new stride is 0, indicating multiple reads from the same cache line.
* Fix implementation of Best Offset Prefetcher #1403
* Add SMS Prefetcher
### Configuration scripts
* Update the full system gem5 Standard Library example scripts to use Ubuntu 24.04 disk images #1491
* Add RV32 option to configs/example/riscv/fs_linux.py #1312
* Other updates to configs/example/riscv/fs_linux.py #1753
### Multisim
* simerr.txt and simout.txt now output into the correct sub-directory when -re is passed #1551
### Compiler and OS support
As of this release, gem5 supports Clang versions 14 through 18 and GCC versions 10 through 14.
Other versions may work, but they are not regularly tested.
### Multiple Ruby Protocols in a Single Build
There are many developer facing / API changes to enable Ruby multiple protocols in a single build.
The most notable changes are:
* Removes the RubySlicc_interfaces.slicc file from the SLICC includes of
every protocol.
* Changes required: If you have a custom protocol, you will need to remove the line `include "RubySlicc_interfaces.slicc"` from your .slicc file.
* Updates the build configurations variables
* **USER FACING CHANGE**: The Ruby protocols in Kconfig have changed names (they are now the same case as the SLICC file names), and in addition, So, after this commit, your build configurations need to be updated. You can do so by running `scons menuconfig <build dir>` and selecting the right ruby options. Alternatively, if you're using a `build_opts` file, you can run `scons defconfig build/<ISA> build_opts/<ISA>` which should update your config correctly.
* **USER FACING CHANGE**: The the "build_opts/ALL" build spec has been updated to include all Ruby protocols . As such, gem5 compilations of the "ALL" compilation target will include all gem5 Ruby protocols (previously just MESI_Two_Level).
* A "build_opts/NULL_ALL_RUBY" build spec has been added to include all Ruby protocols for a "NULL ISA" build . This is useful for testing Ruby protocols without the overhead of a full ISA and is used in gem5's traffic generator tests.
* A "build_opts/ARM_X86" build spec has been added due to a unique restriction in the "tests/gem5/fs/linux/arm" tests which requires a compilation of gem5 with both ARM and X86 and solely the MESI_Two_Level protocol.
### Multiple RubySystem objects in a simulation
Simulation configurations can now create multiple `RubySystem`s in the same simulation.
Previously this was not possible due to `RubySystem` sharing variables across all `RubySystems` (e.g., cache line size).
Allowing this feature requires developer facing changes for custom Ruby protocols.
The most common changes will be:
* Modify your custom protocol SLICC files, replace any instances of `RubySystem::foo()` with `m_ruby_system->foo()`, and recompile. `m_ruby_system` is automatically set by SLICC generated code.
* If your custom protocol contains local `WriteMask` declarations (e.g., `WriteMask tmp_mask;`), modify the protocol so that `tmp_mask.setBlockSize(...)` is called. Use the block size of the `RubySystem` here (e.g., you can use `other_mask.getBlockSize()` or get block size from another object).
* Modify your python configurations to assign the parameter `ruby_system` for the python classes `RubySequencer`, `RubyDirectoryMemory`, and `RubyPortProxy` or any derived classes. You will receive an error at the start of gem5 if this is not done.
* If your python configuration uses a `RubyPrefetcher`, modify the configuration to assign the `block_size` parameter to the cache line size of the `RubySystem` the prefetcher is part of.
The complete list of changes are:
*`AbstractCacheEntry`, `ALUFreeListArray`, `DataBlock`, `Message`, `PerfectCacheMemory`, `PersistentTable`, `TBETable`, `TimerTable`, and `WriteMask` classes now require the cache line size to be explicitly set. This is handled automatically by the SLICC parser but must be done explicitly in C++ code by calling `setBlockSize()`.
*`RubyPrefetcher` now requires `block_size` be assigned in python configurations.
*`CacheMemory` now requires a pointer to the `RubySystem` to be set. This is handled automatically by the SLICC parser but must be done explicitly in C++ code by calling `setRubySystem()`.
*`RubyDirectoryMemory`, `RubyPortProxy`, and `RubySequencer` now require a pointer to the `RubySystem` to be set by python configurations. If you have custom protocols using `DirectoryMemory` or derived classes from it, the `ruby_system` parameter must be set in the python configuration.
*`ALUFreeListArray` and `BankedArray` now require a clock period to be set in C++ using `setClockPeriod()` and no longer require a pointer to the `RubySystem`.
* You may no longer call `RubySystem::getBlockSizeBytes()`, `RubySystem::getBlockSizeBits()`, etc. You must have a pointer to the `RubySystem` you are a part of and call, for example, `ruby_system->getBlockSizeBytes()`.
*`MessageBuffer::enqueue()` has two new parameters indicating if the `RubySystem` has randomization and warmup enabled. You must explicitly specify these values now.
## ArmISA changes/improvements
### Architectural extensions
Architectural support for the following extensions:
* FEAT_TTST
* FEAT_XS
### Bugfixes
* Add support of AArch32 VRINTN/X/A/Z/M/P instructions
* Add support of AArch32 VCVTA/P/N/M instructions
* The following syscalls have been added in SE mode
* readv
* poll
* pread64
* pwrite64
* truncate64
* The following syscalls have been fixed in SE mode when running on a 32bit HOST:
* getcwd
* lseek
### CPU changes
Before this release the Arm TLBs were using an hardcoded fully associative model with LRU replacement policy.
The associativity and replacement policy of the Arm TLBs are now configurable with the IndexingPolicy and ReplacementPolicy classes by setting the indexing_policy and replacement_policy params.
While default behaviour is still LRU + FA, the L2 TLB in the ArmMMU (l2_shared) has been converted from being a fully associative structure into being a 5-way set associative.
PR [1084](https://github.com/gem5/gem5/pull/1084) introduced two new CHI relevant classes.
* The first one is the CHIGenericController. This is a purely C++ based / abstract interface of a Coherence Controller for ruby.
It is meant to bypass SLICC and removes the limitation of using the gem5 Sequencer and associated data structures.
* The second one is the CHI-TLM controller, which extends the aforementioned CHIGenericController. This is a bridge between the AMBA TLM 2.0 implementation of CHI [1](https://developer.arm.com/documentation/101459/latest) [2](https://developer.arm.com/Architectures/AMBA#Downloads) with the gem5 (ruby) one.
In other words it translates AMBA CHI transactions into ruby messages (which are then forwarded to the MessageQueues)
and vice versa.
```text
ARM::CHI::Payload, CHIRequestMsg
<--> CHIDataMsg
ARM::CHI::Phase CHIResponseMsg
CHIDataMsg
```
In this way it will be possible to connect external RNF models to the ruby interconnect via the CHI-TLM library
## RISC-V ISA improvements
* Use sign extend for all address generation #1316
* Fix implicit int-to-float conversion in .isa files #1319
* Implement Zcmp instructions #1432
* Add support for riscv hardware probing syscall #1525
* Add support for Zicbop extension #1710
* Fix vector instruction assertion caused by speculative execution #1711
## GPU model improvements
The GPUFS model is now available in the standard library!
There is a new `ViperBoard` in `gem5.prebuilt.viper`.
This board is an initial implementation and will be improved in the next versions of gem5.
There is an example script in `configs/example/gem5_library/x86-mi300x-gpu.py` that shows how to use the `ViperBoard`.
See #1636.
### Other GPU changes
* Vega10 has been deprecated #1619
* Replacement policy has been improved #1564
* Swizzle multi-dword scratch requests now supported #1445
* Many improvements to Vega implementation including memtime, SDWA, SDWAB, and DPP instructions #1350, #1378
* Matrix Core Engines (AMD's equivalent to NVIDIA's TensorCores) now supported! #1248, #1700
* Pannotia tests integrated into weekly tests #1584
## Other Miscellaneous Changes
### Other Ruby Related Changes
* RubyHitMiss debug flag #1260
* Prevent LL/SC livelock in MESI protocols #1399
* Added files for [generating Sphinx documentation](https://github.com/gem5/gem5/pull/335) for the gem5 standard library.
### Other
* Looppoint analysis object #1419
* Add global and local instruction trackers for raising instruction executed exit events with multi-core simulation #1433
### Development
* Removal of Gerrit Change-ID requirement #1486
# Version 24.0.0.1
**[HOTFIX]** Fixes a bug affecting the use of the `IndirectMemoryPrefetcher`, `SignaturePathPrefetcher`, `SignaturePathPrefetcherV2`, `STeMSPrefetcher`, and `PIFPrefetcher` SimObjects.
Use of these resulted in gem5 crashing a gem5 crash with the error message "Need is_secure arg".
The fix to this introduced to the gem5 develop branch in the <https://github.com/gem5/gem5/pull/1374> Pull Request.
The commits in this PR were cherry-picked on the gem5 stable branch to create the v24.0.0.1 hotfix release.
# Version 24.0
gem5 Version 24.0 is the first major release of 2024.
During this time there have been 298 pull requests merged, comprising of over 600 commits, from 56 unique contributors.
## API and user-facing changes
* The GCN3 GPU model has been removed in favor of the newer VEGA_X85 GPU model.
* gem5 now supports building, running, and simulating Ubuntu 24.04.
### Compiler and OS support
As of this release gem5 support Clang version 6 to 16 and GCC version 10 to 13.
While other compilers and versions may work, they are not regularly tested.
gem5 now supports building, running, and simulating on Ubuntu 24.04.
We continue to support 22.04 with 20.04 being deprecated in the coming year.
The majority of our testing is done on Ubuntu LTS systems though Apple Silicon machines and other Linux distributions have also been used regularly during development.
Improvements have been made to ensure a wider support of operating systems.
## New features
### gem5 MultiSim: Multiprocessing for gem5
The gem5 "MultiSim" module allows for multiple simulations to be run from a single gem5 execution via a single gem5 configuration script.
This allows for multiple simulations to be run in parallel in a structured manner.
To use MultiSim first create multiple simulators and add them to the MultiSim with the `add_simulator` function.
If needed, limit the maximum number of parallel processes with the `set_num_processes` function.
Then run the simulations in parallel with the `gem5` binary using `-m gem5.utils.multisim`.
Here is an example of how to use MultiSim:
```python
importgem5.utils.multisimasmultisim
# Set the maximum number of processes to run in parallel
multisim.set_num_processes(4)
# Create multiple simulators.
# In this case, one for each workload in the benchmark suite.
forworkloadinbenchmark_suite:
board=X86Board(
# ...
)
board.set_workload(workload)
# Useful to set the ID here. This is used to create unique output
# directorires for each gem5 process and can be used to idenfify and
The output directory ("m5out" by default) will contain sub-directories for each simulation run.
The sub-directory will be named after the simulator ID set in the configuration script.
We therefore recommend setting the simulator ID to something meaningful to help identify the output directories (i.e., the workload run or something identifying the meaningful characteristics of the simulated system in comparison to others).
If only one simulation specified in the config needs run, you can do so with:
```sh
<gem5 binary> <config script> --list # Lists the simulations by ID
<gem5 binary> <config script> <ID> # Run the simulation with the specified ID.
```
Example scripts of using MultiSim can be found in "configs/example/gem5_library/multisim".
### RISC-V Vector Extension Support
There have been significant improvements to the RVV support in gem5 including
* Fixed viota (#1137)
* Fixed vrgather (#1134)
* Added RVV FP16 support (#1123)
* Fixed widening and narrowing instructions (#1079)
* Fixed bug in vfmv.f.s (#863)
* Add unit stride segment loads and stores (#851) (#913)
* Added an new generator which can generate requests based on [spatter](https://github.com/hpcgarage/spatter) patterns.
* KVM is now supported in the gem5 Standard Library ARM Board.
* Generic Cache template added to the Standard Library (#745)
* Support added for partitioning caches.
* The Standard Library `obtain_resources` function can request multiple resources at once thus reducing delay associated with multiple requests.
* An official gem5 DevContainer has been added to the gem5 repository.
This can be used to build and run gem5 in consistent environment and enables GitHub Codespaces support.
### gem5 Python Statistics
The gem5 Python statistics API has been improved.
The gem5 Project's general intent with this improvement is make it easier and more desirable to obtain and interact with gem5 simulation statistics via Python.
For example, the following code snippet demonstrates how to obtain statistics from a gem5 simulation:
```python
fromm5.stats.gem5statsimportget_simstat
## Setup and run the configuation ...
simstat=get_simstat(board)
# Print the number of cycles the CPU at index 0 has executed.
print(simstat.cpu[0].numCycles)
# Strings can also be used to access statistics.
print(simstat['cpu'][0]['numCycles'])
# Print the total number of cycles executed by all CPUs.
We hope the usage of the gem5 Python statistics API will be more intuitive and easier to use while allowing better processing of statistical data.
### GPU Model
* Support for MI300X and MI200 GPU models including their features and most instructions.
* ROCm 6.1 disk image and compile docker files have been added. ROCm 5.4.2 and 4.2 resources are removed.
* The deprecated GCN3 ISA has been removed. Use VEGA instead.
## Bug Fixes
* An integer overflow error known to affect the `AddrRange` class has been fixed.
* Fix fflags behavior of floating point instruction in RISC-V for Out-of-Order CPUs.
### Arm FEAT_MPAM Support
An initial implementation of FEAT_MPAM has been introduced in gem5 with the capability to statically partition
classic caches. Guidance on how to use this is available on a Arm community [blog post](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/gem5-cache-partitioning)
# Version 23.1
gem5 Version 23.1 is our first release where the development has been on GitHub.
During this release, there have been 362 pull requests merged which comprise 416 commits with 51 unique contributors.
## Significant API and user-facing changes
### The gem5 build can is now configured with `kconfig`
* Most gem5 builds without customized options (excluding double dash options) (e.g. , build/X86/gem5.opt) are backwards compatible and require no changes to your current workflows.
* All of the default builds in `build_opts` are unchanged and still available.
* However, if you want to specialize your build. For example, use customized ruby protocol. The command `scons PROTOCOL=<PROTOCAL_NAME> build/ALL/gem5.opt` will not work anymore. you now have to use `scons <kconfig command>` to update the ruby protocol as example. The double dash options (`--without-tcmalloc`, `--with-asan` and so on) are still continue to work as normal.
* For more details refer to the documentation here: [kconfig documentation](https://www.gem5.org/documentation/general_docs/kconfig_build_system/)
### Standard library improvements
#### `WorkloadResource` added to resource specialization
* The `Workload` and `CustomWorkload` classes are now deprecated. They have been transformed into wrappers for the `obtain_resource` and `WorkloadResource` classes in `resource.py`, respectively.
* Code utilizing the older API will continue to function as expected but will trigger a warning message. To update code using the `Workload` class, change the call from `Workload(id='resource_id', resource_version='1.0.0')` to `obtain_resource(id='resource_id', resource_version='1.0.0')`. Similarly, to update code using the `CustomWorkload` class, change the call from `CustomWorkload(function=func, parameters=params)` to `WorkloadResource(function=func, parameters=params)`.
* Workload resources in gem5 can now be directly acquired using the `obtain_resource` function, just like other resources.
#### Introducing Suites
Suites is a new category of resource being introduced in gem5. Documentation of suites can be found here: [suite documentation](https://www.gem5.org/documentation/gem5-stdlib/suites).
#### Other API changes
* All resource object now have their own `id` and `category`. Each resource class has its own `__str__()` function which return its information in the form of **category(id, version)** like **BinaryResource(id='riscv-hello', resource_version='1.0.0')**.
* Users can use GEM5_RESOURCE_JSON and GEM5_RESOURCE_JSON_APPEND env variables to overwrite all the data sources with the provided JSON and append a JSON file to all the data source respectively. More information can be found [here](https://www.gem5.org/documentation/gem5-stdlib/using-local-resources).
### Other user-facing changes
* Added support for clang 15 and clang 16
* gem5 no longer supports building on Ubuntu 18.04
* GCC 7, GCC 9, and clang 6 are no longer supported
* Two `DRAMInterface` stats have changed names (`bytesRead` and `bytesWritten`). For instance, `board.memory.mem_ctrl.dram.bytesRead` and `board.memory.mem_ctrl.dram.bytesWritten`. These are changed to `dramBytesRead` and `dramBytesWritten` so they don't collide with the stat with the same name in `AbstractMemory`.
* The stats for `NVMInterface` (`bytesRead` and `bytesWritten`) have been change to `nvmBytesRead` and `nvmBytesWritten` as well.
## Full-system GPU model improvements
* Support for up to latest ROCm 5.7.1.
* Various changes to enable PyTorch/TensorFlow simulations.
* New packer disk image script containing ROCm 5.4.2, PyTorch 2.0.1, and Tensorflow 2.11.
* GPU instructions can now perform atomics on host addresses.
* The provided configs scripts can now run KVM on more restrictive setups.
* Add support to checkpoint and restore between kernels in GPUFS, including adding various AQL, HSA Queue, VMID map, MQD attributes, GART translations, and PM4Queues to GPU checkpoints
* move GPU cache recorder code to RubyPort instead of Sequencer/GPUCoalescer to allow checkpointing to occur
* add support for flushing GPU caches, as well as cache cooldown/warmup support, for checkpoints
* Update vega10_kvm.py to add checkpointing instructions
## SE mode GPU model improvements
* started adding support for mmap'ing inputs for GPUSE tests, which reduces their runtime by 8-15% per run
## GPU model improvements
* update GPU VIPER and Coalescer support to ensure correct replacement policy behavior when multiple requests from the same CU are concurrently accessing the same line
* fix bug with GPU VIPER to resolve a race conflict for loads that bypass the TCP (L1D$)
* fix bug with MRU replacement policy updates in GPU SQC (I$)
* update GPU and Ruby debug prints to resolve various small errors
* Add configurable GPU L1,L2 num banks and L2 latencies
* Add decodings for new MI100 VOP2 insts
* Add GPU GLC Atomic Resource Constraints to better model how atomic resources are shared at GPU TCC (L2$)
* Update GPU tester to work with both requests that bypass all caches (SLC) and requests that bypass only the TCP (L1D$)
* Fixes for how write mask works for GPU WB L2 caches
* Added support for WB and WT GPU atomics
* Added configurable support to better model the latency of GPU atomic requests
* fix GPU's default number of HW barrier/CU to better model amount of concurrency GPU CUs should have
## RISC-V RVV 1.0 implemented
This was a huge undertaking by a large number of people!
Some of these people include Adrià Armejach who pushed it over the finish line, Xuan Hu who pushed the most recent version to gerrit that Adrià picked up,
Jerin Joy who did much of the initial work, and many others who contributed to the implementation including Roger Chang, Hoa Nguyen who put significant effort into testing and reviewing the code.
* Most of the instructions in the 1.0 spec implemented
* Works with both FS and SE mode
* Compatible with Simple CPUs, the O3, and the minor CPU models
* User can specify the width of the vector units
* Future improvements
* Widening/narrowing instructions are *not* implemented
* The model for executing memory instructions is not very high performance
* The statistics are not correct for counting vector instruction execution
## ArmISA changes/improvements
* Architectural support for the following extensions:
* FEAT_TLBIRANGE
* FEAT_FGT
* FEAT_TCR2
* FEAT_SCTLR2
* Arm support for SVE instructions improved
* Fixed some FEAT_SEL2 related issues:
* [Fix virtual interrupt logic in secure mode](https://github.com/gem5/gem5/pull/584)
* Improvements to the CHI coherence protocol implementation
* Far atomics implemented in CHI
* Ruby now supports using the prefetchers from the classic caches, if the protocol supports it. CHI has been extended to support the classic prefetchers.
* Bug in RISC-V TLB to fixed to correctly count misses and hits
* Added new [RISC-V Zcb instructions](https://github.com/gem5/gem5/pull/399)
* RISC-V can now use a separate binary for the bootloader and kernel in FS mode
* DRAMSys integration updated to latest DRAMSys version (5.0)
* Improved support for RISC-V privilege modes
* Fixed bug in switching CPUs with RISC-V
* CPU branch preditor refactoring to prepare for decoupled front end support
* Perf is now optional when using the KVM CPU model
* Improvements to the gem5-SST bridge including updating to SST 13.0
* Improved formatting of documentation in stdlib
* By default use isort for python imports in style
* Many, many testing improvements during the migration to GitHub actions
* Fixed the elastic trace replaying logic (TraceCPU)
## Known Bugs/Issues
* [RISC-V RVV Bad execution of riscv rvv vss instruction](https://github.com/gem5/gem5/issues/594)
* [Implement AVX xsave/xstor to avoid workaround when checkpointing](https://github.com/gem5/gem5/issues/434)
* [Adding Vector Segmented Loads/Stores to RISC-V V 1.0 implementation](https://github.com/gem5/gem5/issues/382)
* [Integer overflow in AddrRange subset check](https://github.com/gem5/gem5/issues/240)
* [RISCV64 TLB refuses to access upper half of physical address space](https://github.com/gem5/gem5/issues/238)
* [Bug when trying to restore checkpoints in SPARC: “panic: panic condition !pte occurred: Tried to execute unmapped address 0.”](https://github.com/gem5/gem5/issues/197)
* [BaseCache::recvTimingResp can trigger an assertion error from getTarget() due to MSHR in senderState having no targets](https://github.com/gem5/gem5/issues/100)
# Version 23.0.1.0
This minor release incorporates documentation updates, bug fixes, and some minor improvements.
@@ -70,10 +586,10 @@ Scons no longer defines the `DEBUG` guard in debug builds, so code making using
Also, this release:
- Removes deprecated namespaces. Namespace names were updated a couple of releases ago. This release removes the old names.
- Uses `MemberEventWrapper` in favor of `EventWrapper` for instance member functions.
- Adds an extension mechanism to `Packet` and `Request`.
- Sets x86 CPU vendor string to "HygoneGenuine" to better support GLIBC.
* Removes deprecated namespaces. Namespace names were updated a couple of releases ago. This release removes the old names.
* Uses `MemberEventWrapper` in favor of `EventWrapper` for instance member functions.
* Adds an extension mechanism to `Packet` and `Request`.
* Sets x86 CPU vendor string to "HygoneGenuine" to better support GLIBC.
## New features and improvements
@@ -105,6 +621,7 @@ Architectural support for Armv9 [Scalable Matrix extension](https://developer.ar
The implementation employs a simple renaming scheme for the Za array register in the O3 CPU, so that writes to difference tiles in the register are considered a dependency and are therefore serialized.
The following SVE and SIMD & FP extensions have also been implemented:
* FEAT_F64MM
* FEAT_F32MM
* FEAT_DOTPROD
@@ -127,32 +644,31 @@ gem5 can now use DRAMSys <https://github.com/tukl-msd/DRAMSys> as a DRAM backend
* Adds general RISC-V improvements to provide better stability.
### Standard library improvements and new components
This release:
- Adds MESI_Three_Level component.
- Supports ELFies and LoopPoint analysis output from Sniper.
- Supports DRAMSys in the stdlib.
* Adds MESI_Three_Level component.
* Supports ELFies and LoopPoint analysis output from Sniper.
* Supports DRAMSys in the stdlib.
## Bugfixes and other small improvements
This release also:
- Removes deprecated python libraries.
- Adds a DDR5 model.
- Adds AMD GPU MI200/gfx90a support.
- Changes building so it no longer "duplicates sources" in build/ which improves support for some IDEs and code analysis. If you still need to duplicate sources you can use the `--duplicate-sources` option to `scons`.
- Enables `--debug-activate=<object name>` to use debug trace for only a single SimObject (the opposite of `--debug-ignore`). See `--debug-help` for more information.
- Adds support to exit the simulation loop based on Arm-PMU events.
- Supports Python 3.11.
- Adds the idea of a CpuCluster to gem5.
* Removes deprecated python libraries.
* Adds a DDR5 model.
* Adds AMD GPU MI200/gfx90a support.
* Changes building so it no longer "duplicates sources" in build/ which improves support for some IDEs and code analysis. If you still need to duplicate sources you can use the `--duplicate-sources` option to `scons`.
* Enables `--debug-activate=<object name>` to use debug trace for only a single SimObject (the opposite of `--debug-ignore`). See `--debug-help` for more information.
* Adds support to exit the simulation loop based on Arm-PMU events.
* Supports Python 3.11.
* Adds the idea of a CpuCluster to gem5.
# Version 22.1.0.0
@@ -163,23 +679,23 @@ See below for more details!
## New features and improvements
- The gem5 binary can now be compiled to include multiple ISA targets.
* The gem5 binary can now be compiled to include multiple ISA targets.
A compilation of gem5 which includes all gem5 ISAs can be created using: `scons build/ALL/gem5.opt`.
This will use the Ruby `MESI_Two_Level` cache coherence protocol by default, to use other protocols: `scons build/ALL/gem5.opt PROTOCOL=<other protocol>`.
The classic cache system may continue to be used regardless as to which Ruby cache coherence protocol is compiled.
- The `m5` Python module now includes functions to set exit events are particular simululation ticks:
- *setMaxTick(tick)* : Used to to specify the maximum simulation tick.
- *getMaxTick()* : Used to obtain the maximum simulation tick value.
- *getTicksUntilMax()*: Used to get the number of ticks remaining until the maximum tick is reached.
- *scheduleTickExitFromCurrent(tick)* : Used to schedule an exit exit event a specified number of ticks in the future.
- *scheduleTickExitAbsolute(tick)* : Used to schedule an exit event as a specified tick.
- We now include the `RiscvMatched` board as part of the gem5 stdlib.
* The `m5` Python module now includes functions to set exit events are particular simululation ticks:
* *setMaxTick(tick)* : Used to to specify the maximum simulation tick.
* *getMaxTick()* : Used to obtain the maximum simulation tick value.
* *getTicksUntilMax()*: Used to get the number of ticks remaining until the maximum tick is reached.
* *scheduleTickExitFromCurrent(tick)* : Used to schedule an exit exit event a specified number of ticks in the future.
* *scheduleTickExitAbsolute(tick)* : Used to schedule an exit event as a specified tick.
* We now include the `RiscvMatched` board as part of the gem5 stdlib.
This board is modeled after the [HiFive Unmatched board](https://www.sifive.com/boards/hifive-unmatched) and may be used to emulate its behavior.
See "configs/example/gem5_library/riscv-matched-fs.py" and "configs/example/gem5_library/riscv-matched-hello.py" for examples using this board.
- An API for [SimPoints](https://doi.org/10.1145/885651.781076) has been added.
* An API for [SimPoints](https://doi.org/10.1145/885651.781076) has been added.
SimPoints can substantially improve gem5 Simulation time by only simulating representative parts of a simulation then extrapolating statistical data accordingly.
Examples of using SimPoints with gem5 can be found in "configs/example/gem5_library/checkpoints/simpoints-se-checkpoint.py" and "configs/example/gem5_library/checkpoints/simpoints-se-restore.py".
- "Workloads" have been introduced to gem5.
* "Workloads" have been introduced to gem5.
Workloads have been incorporated into the gem5 Standard library.
They can be used specify the software to be run on a simulated system that come complete with input parameters and any other dependencies necessary to run a simuation on the target hardware.
At the level of the gem5 configuration script a user may specify a workload via a board's `set_workload` function.
@@ -187,105 +703,104 @@ For example, `set_workload(Workload("x86-ubuntu-18.04-boot"))` sets the board to
This workload specifies a boot consisting of the Linux 5.4.49 kernel then booting an Ubunutu 18.04 disk image, to exit upon booting.
Workloads are agnostic to underlying gem5 design and, via the gem5-resources infrastructure, will automatically retrieve all necessary kernels, disk-images, etc., necessary to execute.
Examples of using gem5 Workloads can be found in "configs/example/gem5_library/x86-ubuntu-ruby.py" and "configs/example/gem5_library/riscv-ubuntu-run.py".
- To aid gem5 developers, we have incorporated [pre-commit](https://pre-commit.com) checks into gem5.
* To aid gem5 developers, we have incorporated [pre-commit](https://pre-commit.com) checks into gem5.
These checks automatically enforce the gem5 style guide on Python files and a subset of other requirements (such as line length) on altered code prior to a `git commit`.
Users may install pre-commit by running `./util/pre-commit-install.sh`.
Passing these checks is a requirement to submit code to gem5 so installation is strongly advised.
- A multiprocessing module has been added.
* A multiprocessing module has been added.
This allows for multiple simulations to be run from a single gem5 execution via a single gem5 configuration script.
Example of usage found [in this commit message](https://gem5-review.googlesource.com/c/public/gem5/+/63432).
**Note: This feature is still in development.
While functional, it'll be subject to subtantial changes in future releases of gem5**.
- The stdlib's `ArmBoard` now supports Ruby caches.
- Due to numerious fixes and improvements, Ubuntu 22.04 can be booted as a gem5 workload, both in FS and SE mode.
- Substantial improvements have been made to gem5's GDB capabilities.
- The `HBM2Stack` has been added to the gem5 stdlib as a memory component.
- The `MinorCPU` has been fully incorporated into the gem5 Standard Library.
- We now allow for full-system simulation of GPU applications.
* The stdlib's `ArmBoard` now supports Ruby caches.
* Due to numerious fixes and improvements, Ubuntu 22.04 can be booted as a gem5 workload, both in FS and SE mode.
* Substantial improvements have been made to gem5's GDB capabilities.
* The `HBM2Stack` has been added to the gem5 stdlib as a memory component.
* The `MinorCPU` has been fully incorporated into the gem5 Standard Library.
* We now allow for full-system simulation of GPU applications.
The introduction of GPU FS mode allows for the same use-cases as SE mode but reduces the requirement of specific host environments or usage of a Docker container.
The GPU FS mode also has improved simulated speed by functionally simulating memory copies, and provides an easier update path for gem5 developers.
An X86 host and KVM are required to run GPU FS mode.
## API (user facing) changes
- The default CPU Vendor String has been updated to `HygonGenuine`.
* The default CPU Vendor String has been updated to `HygonGenuine`.
This is due to newer versions of GLIBC being more strict about checking current system's supported features.
The previous value, `M5 Simulator`, is not recognized as a valid vendor string and therefore GLIBC returns an error.
- [The stdlib's `_connect_things` funciton call has been moved from the `AbstractBoard`'s constructor to be run as board pre-instantiation process](https://gem5-review.googlesource.com/c/public/gem5/+/65051).
* [The stdlib's `_connect_things` funciton call has been moved from the `AbstractBoard`'s constructor to be run as board pre-instantiation process](https://gem5-review.googlesource.com/c/public/gem5/+/65051).
This is to overcome instances where stdlib components (memory, processor, and cache hierarhcy) require Board information known only after its construction.
**This change breaks cases where a user utilizes the stdlib `AbstractBoard` but does not use the stdlib `Simulator` module. This can be fixed by adding the `_pre_instantiate` function before `m5.instantiate`**.
An exception has been added which explains this fix, if this error occurs.
- The setting of checkpoints has been moved from the stdlib's "set_workload" functions to the `Simulator` module.
* The setting of checkpoints has been moved from the stdlib's "set_workload" functions to the `Simulator` module.
Setting of checkpoints via the stdlib's "set_workload" functions is now deprecated and will be removed in future releases of gem5.
- The gem5 namespace `Trace` has been renamed `trace` to conform to the gem5 style guide.
- Due to the allowing of multiple ISAs per gem5 build, the `TARGET_ISA` variable has been replaced with `USE_$(ISA)` variables.
* The gem5 namespace `Trace` has been renamed `trace` to conform to the gem5 style guide.
* Due to the allowing of multiple ISAs per gem5 build, the `TARGET_ISA` variable has been replaced with `USE_$(ISA)` variables.
For example, if a build contains both the X86 and ARM ISAs the `USE_X86` and `USE_ARM` variables will be set.
## Big Fixes
- Several compounding bugs were causing bugs with floating point operations within gem5 simulations.
* Several compounding bugs were causing bugs with floating point operations within gem5 simulations.
These have been fixed.
- Certain emulated syscalls were behaving incorrectly when using RISC-V due to incorrect `open(2)` flag values.
* Certain emulated syscalls were behaving incorrectly when using RISC-V due to incorrect `open(2)` flag values.
These values have been fixed.
- The GIVv3 List register mapping has been fixed.
- Access permissions for GICv3 cpu registers have been fixed.
- In previous releases of gem5 the `sim_quantum` value was set for all cores when using the Standard Library.
* The GIVv3 List register mapping has been fixed.
* Access permissions for GICv3 cpu registers have been fixed.
* In previous releases of gem5 the `sim_quantum` value was set for all cores when using the Standard Library.
This caused issues when setting exit events at a particular tick as it resulted in the exit being off by `sim_quantum`.
As such, the `sim_quantum` value is only when using KVM cores.
- PCI ranges in `VExpress_GEM5_Foundation` fixed.
- The `SwitchableProcessor` processor has been fixed to allow switching to a KVM core.
* PCI ranges in `VExpress_GEM5_Foundation` fixed.
* The `SwitchableProcessor` processor has been fixed to allow switching to a KVM core.
Previously the `SwitchableProcessor` only allowed a user to switch from a KVM core to a non-KVM core.
- The Standard Library has been fixed to permit multicore simulations in SE mode.
- [A bug was fixed in the rcr X86 instruction](https://gem5.atlassian.net/browse/GEM5-1265).
* The Standard Library has been fixed to permit multicore simulations in SE mode.
* [A bug was fixed in the rcr X86 instruction](https://gem5.atlassian.net/browse/GEM5-1265).
## Build related changes
- gem5 can now be compiled with Scons 4 build system.
- gem5 can now be compiled with Clang version 14 (minimum Clang version 6).
- gem5 can now be compiled with GCC Version 12 (minimum GCC version 7).
* gem5 can now be compiled with Scons 4 build system.
* gem5 can now be compiled with Clang version 14 (minimum Clang version 6).
* gem5 can now be compiled with GCC Version 12 (minimum GCC version 7).
## Other minor updates
- The gem5 stdlib examples in "configs/example/gem5_library" have been updated to, where appropriate, use the stdlib's Simulator module.
* The gem5 stdlib examples in "configs/example/gem5_library" have been updated to, where appropriate, use the stdlib's Simulator module.
These example configurations can be used for reference as to how `Simulator` module may be utilized in gem5.
- Granulated SGPR computation has been added for gfx9 gpu-compute.
- The stdlib statistics have been improved:
- A `get_simstats` function has been added to access statistics from the `Simulator` module.
- Statistics can be printed: `print(simstats.board.core.some_integer)`.
- GDB ports are now specified for each workload, as opposed to per-simulation run.
- The `m5` utility has been expanded to include "workbegin" and "workend" annotations.
* Granulated SGPR computation has been added for gfx9 gpu-compute.
* The stdlib statistics have been improved:
* A `get_simstats` function has been added to access statistics from the `Simulator` module.
* Statistics can be printed: `print(simstats.board.core.some_integer)`.
* GDB ports are now specified for each workload, as opposed to per-simulation run.
* The `m5` utility has been expanded to include "workbegin" and "workend" annotations.
This can be added with `m5 workbegin` and `m5 workend`.
- A `PrivateL1SharedL2CacheHierarchy` has been added to the Standard Library.
- A `GEM5_USE_PROXY` environment variable has been added.
* A `PrivateL1SharedL2CacheHierarchy` has been added to the Standard Library.
* A `GEM5_USE_PROXY` environment variable has been added.
This allows users to specify a socks5 proxy server to use when obtaining gem5 resources and the resources.json file.
It uses the format `<host>:<port>`.
- The fastmodel support has been improved to function with Linux Kernel 5.x.
- The `set_se_binary_workload` function now allows for the passing of input parameters to a binary workload.
- A functional CHI cache hierarchy has been added to the gem5 Standard Library: "src/python/gem5/components/cachehierarchies/chi/private_l1_cache_hierarchy.py".
- The RISC-V K extension has been added.
* The fastmodel support has been improved to function with Linux Kernel 5.x.
* The `set_se_binary_workload` function now allows for the passing of input parameters to a binary workload.
* A functional CHI cache hierarchy has been added to the gem5 Standard Library: "src/python/gem5/components/cachehierarchies/chi/private_l1_cache_hierarchy.py".
* Added x86 bare metal workload and better real mode support
* [Added round-robin arbitration when using multiple prefetchers](https://gem5.atlassian.net/browse/GEM5-1169)
* [KVM Emulation added for ARM GIGv3](https://gem5.atlassian.net/browse/GEM5-1138)
* Many improvements to the CHI protocol
## Many RISC-V instructions added
@@ -362,7 +877,7 @@ An example gem5 configuration script using this board can be found in `configs/e
When the system is configured for NUMA, it has multiple memory ranges, and each memory range is mapped to a corresponding NUMA node. For this, the change enables `createAddrRanges` to map address ranges to only a given HNFs.
For instance, the `O3CPU` is now the `X86O3CPU` and `ArmO3CPU`, etc.
This requires a number of changes if you have your own CPU models.
See https://gem5-review.googlesource.com/c/public/gem5/+/52490 for details.
See [here](https://gem5-review.googlesource.com/c/public/gem5/+/52490) for details.
Additionally, this requires changes in any configuration script which inherits from the old CPU types.
@@ -384,31 +899,31 @@ Now, if you want to compile a CPU model for a particular ISA you will have to ad
If you have any specialized CPU models or any ISAs which are not in the mainline, expect many changes when rebasing on this release.
- No longer use read/setIntReg (e.g., see https://gem5-review.googlesource.com/c/public/gem5/+/49766)
- InvalidRegClass has changed (e.g., see https://gem5-review.googlesource.com/c/public/gem5/+/49745)
- All of the register classes have changed (e.g., see https://gem5-review.googlesource.com/c/public/gem5/+/49764/)
-`initiateSpecialMemCmd` renamed to `initiateMemMgmtCmd` to generalize to other command beyond HTM (e.g., DVM/TLBI)
-`OperandDesc` class added (e.g., see https://gem5-review.googlesource.com/c/public/gem5/+/49731)
- Many cases of `TheISA` have been removed
* No longer use read/setIntReg (e.g., see [link](https://gem5-review.googlesource.com/c/public/gem5/+/49766))
* InvalidRegClass has changed (e.g., see [link](https://gem5-review.googlesource.com/c/public/gem5/+/49745))
* All of the register classes have changed (e.g., see [link](https://gem5-review.googlesource.com/c/public/gem5/+/49764/))
*`initiateSpecialMemCmd` renamed to `initiateMemMgmtCmd` to generalize to other command beyond HTM (e.g., DVM/TLBI)
*`OperandDesc` class added (e.g., see [link](https://gem5-review.googlesource.com/c/public/gem5/+/49731))
* Many cases of `TheISA` have been removed
## Bug Fixes
- [Fixed RISC-V call/ret instruction decoding](https://gem5-review.googlesource.com/c/public/gem5/+/58209). The fix adds IsReturn` and `IsCall` flags for RISC-V jump instructions by defining a new `JumpConstructor` in "standard.isa". Jira Ticket here: https://gem5.atlassian.net/browse/GEM5-1139.
- [Fixed x86 Read-Modify-Write behavior in multiple timing cores with classic caches](https://gem5-review.googlesource.com/c/public/gem5/+/55744). Jira Ticket here: https://gem5.atlassian.net/browse/GEM5-1105.
- [The circular buffer for the O3 LSQ has been fixed](https://gem5-review.googlesource.com/c/public/gem5/+/58649). This issue affected running the O3 CPU with large workloaders. Jira Ticket here: https://gem5.atlassian.net/browse/GEM5-1203.
- [Resolved issues with Ruby's memtest](https://gem5-review.googlesource.com/c/public/gem5/+/56811). In gem5 v21.2, If the size of the address range was smaller than the maximum number of outstandnig requests allowed downstream, the tester would get stuck trying to find a unique address. This has been resolved.
* [Fixed RISC-V call/ret instruction decoding](https://gem5-review.googlesource.com/c/public/gem5/+/58209). The fix adds IsReturn` and `IsCall` flags for RISC-V jump instructions by defining a new `JumpConstructor` in "standard.isa". Jira Ticket [here](https://gem5.atlassian.net/browse/GEM5-1139).
* [Fixed x86 Read-Modify-Write behavior in multiple timing cores with classic caches](https://gem5-review.googlesource.com/c/public/gem5/+/55744). Jira Ticket [here](https://gem5.atlassian.net/browse/GEM5-1105).
* [The circular buffer for the O3 LSQ has been fixed](https://gem5-review.googlesource.com/c/public/gem5/+/58649). This issue affected running the O3 CPU with large workloaders. Jira Ticket [here](https://gem5.atlassian.net/browse/GEM5-1203).
* [Resolved issues with Ruby's memtest](https://gem5-review.googlesource.com/c/public/gem5/+/56811). In gem5 v21.2, If the size of the address range was smaller than the maximum number of outstandnig requests allowed downstream, the tester would get stuck trying to find a unique address. This has been resolved.
## Build-related changes
- Variable in `env` in the SConscript files now requires you to use `env['CONF']` to access them. Anywhere that `env['<VARIABLE>']` appeared should noe be `env['CONF']['<VARIABLE>']`
- Internal build files are now in a per-target `gem5.build` directory
- All build variable are per-target and there are no longer any shared variables.
* Variable in `env` in the SConscript files now requires you to use `env['CONF']` to access them. Anywhere that `env['<VARIABLE>']` appeared should noe be `env['CONF']['<VARIABLE>']`
* Internal build files are now in a per-target `gem5.build` directory
* All build variable are per-target and there are no longer any shared variables.
## Other changes
- New bootloader is required for Arm VExpress_GEM5_Foundation platform. See https://gem5.atlassian.net/browse/GEM5-1222 for details.
- The MemCtrl interface has been updated to use more inheritance to make extending it to other memory types (e.g., HBM pseudo channels) easier.
* New bootloader is required for Arm VExpress_GEM5_Foundation platform. See [here](https://gem5.atlassian.net/browse/GEM5-1222) for details.
* The MemCtrl interface has been updated to use more inheritance to make extending it to other memory types (e.g., HBM pseudo channels) easier.
# Version 21.2.1.1
@@ -446,28 +961,28 @@ This has now been wrapped in a larger "standard library".
The *gem5 standard library* is a Python package which contains the following:
- **Components:** A set of Python classes which wrap gem5's models. Some of the components are preconfigured to match real hardware (e.g., `SingleChannelDDR3_1600`) and others are parameterized. Components can be combined together onto *boards* which can be simulated.
- **Resources:** A set of utilities to interact with the gem5-resources repository/website. Using this module allows you to *automatically* download and use many of gem5's prebuilt resources (e.g., kernels, disk images, etc.).
- **Simulate:** *THIS MODULE IS IN BETA!* A simpler interface to gem5's simulation/run capabilities. Expect API changes to this module in future releases. Feedback is appreciated.
- **Prebuilt**: These are fully functioning prebuilt systems. These systems are built from the components in `components`. This release has a "demo" board to show an example of how to use the prebuilt systems.
* **Components:** A set of Python classes which wrap gem5's models. Some of the components are preconfigured to match real hardware (e.g., `SingleChannelDDR3_1600`) and others are parameterized. Components can be combined together onto *boards* which can be simulated.
* **Resources:** A set of utilities to interact with the gem5-resources repository/website. Using this module allows you to *automatically* download and use many of gem5's prebuilt resources (e.g., kernels, disk images, etc.).
* **Simulate:** *THIS MODULE IS IN BETA!* A simpler interface to gem5's simulation/run capabilities. Expect API changes to this module in future releases. Feedback is appreciated.
* **Prebuilt**: These are fully functioning prebuilt systems. These systems are built from the components in `components`. This release has a "demo" board to show an example of how to use the prebuilt systems.
Examples of using the gem5 standard library can be found in `configs/example/gem5_library/`.
The source code is found under `src/python/gem5`.
## Many Arm improvements
- [Improved configurability for Arm architectural extensions](https://gem5.atlassian.net/browse/GEM5-1132): we have improved how to enable/disable architectural extensions for an Arm system. Rather than working with indipendent boolean values, we now use a unified ArmRelease object modelling the architectural features supported by a FS/SE Arm simulation
- [Arm TLB can store partial entries](https://gem5.atlassian.net/browse/GEM5-1108): It is now possible to configure an ArmTLB as a walk cache: storing intermediate PAs obtained during a translation table walk.
- [Implemented a multilevel TLB hierarchy](https://gem5.atlassian.net/browse/GEM5-790): enabling users to compose/model a customizable multilevel TLB hierarchy in gem5. The default Arm MMU has now an Instruction L1 TLB, a Data L1 TLB and a Unified (Instruction + Data) L2 TLB.
- Provided an Arm example script for the gem5-SST integration (<https://gem5.atlassian.net/browse/GEM5-1121>).
* [Improved configurability for Arm architectural extensions](https://gem5.atlassian.net/browse/GEM5-1132): we have improved how to enable/disable architectural extensions for an Arm system. Rather than working with indipendent boolean values, we now use a unified ArmRelease object modelling the architectural features supported by a FS/SE Arm simulation
* [Arm TLB can store partial entries](https://gem5.atlassian.net/browse/GEM5-1108): It is now possible to configure an ArmTLB as a walk cache: storing intermediate PAs obtained during a translation table walk.
* [Implemented a multilevel TLB hierarchy](https://gem5.atlassian.net/browse/GEM5-790): enabling users to compose/model a customizable multilevel TLB hierarchy in gem5. The default Arm MMU has now an Instruction L1 TLB, a Data L1 TLB and a Unified (Instruction + Data) L2 TLB.
* Provided an Arm example script for the gem5-SST integration (<https://gem5.atlassian.net/browse/GEM5-1121>).
## GPU improvements
- Vega support: gfx900 (Vega) discrete GPUs are now both supported and tested with [gem5-resources applications](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/).
- Improvements to the VIPER coherence protocol to fix bugs and improve performance: this improves scalability for large applications running on relatively small GPU configurations, which caused deadlocks in VIPER's L2. Instead of continually replaying these requests, the updated protocol instead wakes up the pending requests once the prior request to this cache line has completed.
- Additional GPU applications: The [Pannotia graph analytics benchmark suite](https://github.com/pannotia/pannotia) has been added to gem5-resources, including Makefiles, READMEs, and sample commands on how to run each application in gem5.
- Regression Testing: Several GPU applications are now tested as part of the nightly and weekly regressions, which improves test coverage and avoids introducing inadvertent bugs.
- Minor updates to the architecture model: We also added several small changes/fixes to the HSA queue size (to allow larger GPU applications with many kernels to run), the TLB (to create GCN3- and Vega-specific TLBs), adding new instructions that were previously unimplemented in GCN3 and Vega, and fixing corner cases for some instructions that were leading to incorrect behavior.
* Vega support: gfx900 (Vega) discrete GPUs are now both supported and tested with [gem5-resources applications](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/).
* Improvements to the VIPER coherence protocol to fix bugs and improve performance: this improves scalability for large applications running on relatively small GPU configurations, which caused deadlocks in VIPER's L2. Instead of continually replaying these requests, the updated protocol instead wakes up the pending requests once the prior request to this cache line has completed.
* Additional GPU applications: The [Pannotia graph analytics benchmark suite](https://github.com/pannotia/pannotia) has been added to gem5-resources, including Makefiles, READMEs, and sample commands on how to run each application in gem5.
* Regression Testing: Several GPU applications are now tested as part of the nightly and weekly regressions, which improves test coverage and avoids introducing inadvertent bugs.
* Minor updates to the architecture model: We also added several small changes/fixes to the HSA queue size (to allow larger GPU applications with many kernels to run), the TLB (to create GCN3- and Vega-specific TLBs), adding new instructions that were previously unimplemented in GCN3 and Vega, and fixing corner cases for some instructions that were leading to incorrect behavior.
## gem5-SST bridges revived
@@ -488,13 +1003,13 @@ However, they should be simple to extend to other ISAs through small source chan
## Other improvements
- Removed master/slave terminology: this was a closed ticket which was marked as done even though there were multiple references of master/slave in the config scripts which we fixed.
- Armv8.2-A FEAT_UAO implementation.
- Implemented 'at' variants of file syscall in SE mode (<https://gem5.atlassian.net/browse/GEM5-1098>).
- Improved modularity in SConscripts.
- Arm atomic support in the CHI protocol
- Many testing improvements.
- New "tester" CPU which mimics GUPS.
* Removed master/slave terminology: this was a closed ticket which was marked as done even though there were multiple references of master/slave in the config scripts which we fixed.
* Armv8.2-A FEAT_UAO implementation.
* Implemented 'at' variants of file syscall in SE mode (<https://gem5.atlassian.net/browse/GEM5-1098>).
* Improved modularity in SConscripts.
* Arm atomic support in the CHI protocol
* Many testing improvements.
* New "tester" CPU which mimics GUPS.
# Version 21.1.0.2
@@ -509,7 +1024,7 @@ This hotfix initializes using loops which fixes the broken statistics.
# Version 21.1.0.0
Since v21.0 we have received 780 commits with 48 unique contributors, closing 64 issues on our [Jira Issue Tracker](https://gem5.atlassian.net/).
In addition to our [first gem5 minor release](#version-21.0.1.0), we have included a range of new features, and API changes which we outline below.
In addition to our first gem5 minor release, we have included a range of new features, and API changes which we outline below.
## Added the Components Library [Alpha Release]
@@ -568,7 +1083,7 @@ Classes that handle set dueling have been created ([Dueler and DuelingMonitor](h
They can be used in conjunction with different cache policies.
A [replacement policy that uses it](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.1.0.0/src/mem/cache/replacement_policies/dueling_rp.hh) has been added for guidance.
## RISC-V is now supported as a host machine.
## RISC-V is now supported as a host machine
gem5 is now compilable and runnable on a RISC-V host system.
@@ -578,6 +1093,7 @@ Deprecation MACROS have been added for deprecating namespaces (`GEM5_DEPRECATED_
**Note:**
For technical reasons, using old macros won't produce any deprecation warnings.
## Refactoring of the gem5 Namespaces
Snake case has been adopted as the new convention for name spaces.
@@ -656,9 +1172,9 @@ Version 21.0.1 is a minor gem5 release consisting of bug fixes. The 21.0.1 relea
* Fixes the [GCN-GPU Dockerfile](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.0.1.0/util/dockerfiles/gcn-gpu/Dockerfile) to pull from the v21-0 bucket.
* Fixes the tests to download from the v21-0 bucket instead of the develop bucket.
* Fixes the Temperature class:
* Fixes [fs_power.py](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.0.1.0/configs/example/arm/fs_power.py), which was producing a ["Temperature is not JSON serializable" error](https://gem5.atlassian.net/browse/GEM5-951).
* Fixes temperature printing in `config.ini`.
* Fixes the pybind export for the `from_kelvin` function.
* Fixes [fs_power.py](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.0.1.0/configs/example/arm/fs_power.py), which was producing a ["Temperature is not JSON serializable" error](https://gem5.atlassian.net/browse/GEM5-951).
* Fixes temperature printing in `config.ini`.
* Fixes the pybind export for the `from_kelvin` function.
* Eliminates a duplicated name warning in [ClockTick](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.0.1.0/src/systemc/channel/sc_clock.cc).
* Fixes the [Ubuntu 18.04 Dockerfile](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.0.1.0/util/dockerfiles/ubuntu-20.04_all-dependencies/Dockerfile) to use Python3 instead of Python2.
* Makes [verify.py](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.0.1.0/src/systemc/tests/verify.py) compatible with Python3.
@@ -769,7 +1285,7 @@ This bug affected the overall behavior of the Garnet Network Model.
# Version 20.1.0.0
Thank you to everyone that made this release possible!
This has been a very productive release with [150 issues](https://gem5.atlassian.net/), over 650 commits (a 25% increase from the 20.0 release), and 58 unique contributors (a 100% increase!).
This has been a very productive release with [150 issues](https://gem5.atlassian.net/), over 650 commits (a 25% increase from the 20.0 release), and 58 unique contributors (a 100% increase!).
## Process changes
@@ -853,7 +1369,7 @@ See <http://www.gem5.org/documentation/general_docs/building> for gem5's current
* There may be some bugs introduced with this change as there were many places in the Python configurations which relied on "duck typing".
* This change is mostly backwards compatible and warning will be issued until at least gem5-20.2.
```
```txt
MasterPort -> RequestorPort
SlavePort -> ResponsePort
@@ -927,7 +1443,7 @@ Below are some of the highlights, though I'm sure I've missed some important cha
* All full system config/run scripts must be updated (e.g., anything that used the `LinuxX86System` or similar SimObject).
* Many of the parameters of `System` are now parameters of the `Workload` (see `src/sim/Workload.py`).
* For instance, many parameters of `LinuxX86System` are now part of `X86FsLinux` which is now the `workload` parameter of the `System` SimObject.
* See https://gem5-review.googlesource.com/c/public/gem5/+/24283/ and https://gem5-review.googlesource.com/c/public/gem5/+/26466 for more details.
* See [here](https://gem5-review.googlesource.com/c/public/gem5/+/24283/) and [here](https://gem5-review.googlesource.com/c/public/gem5/+/26466) for more details.
* Sv39 paging has been added to the RISC-V ISA, bringing gem5 close to running Linux on RISC-V.
# The after_boot.sh script will run a script if it is passed via
# m5 readfile. This is the last exit event before the simulation exits.
yieldTrue
simulator=Simulator(
board=board,
on_exit_event={
# Here we want override the default behavior for the first m5 exit
# exit event.
ExitEvent.EXIT:exit_event_handler()
},
)
simulator.run()
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.