Symbol type is part of the info provided by an ELF object's symtab.
It indicates whether a symbol is a file symbol, or a function symbol,
etc.
This chain of commits introduces a way to only load function symbols
to the gem5's symbol table. The RISC-V BootloaderKernelWorkload now
loads only function symbols from the bootloader and the kernel binaries
by default.
arch-riscv: Fix implementation of CMO extension instructions
This change introduces a template for store instruction's mem access.
The new template is called CacheBlockBasedStore.
The reasons for not reusing the current Store's mem access template
are as follows,
- The CMO extension instructions operate on cache block size
granularity,
while regular load/store instructions operate on data of size 64 bits or
fewer.
- The writeMemAtomicLE/writeMemTimingLE interfaces do not allow passing
nullptr as data. However, CPUs in gem5 rely on (data == NULL) to detect
CACHE_BLOCK_ZERO instructions. Setting `Mem = 0;` to `uint64_t Mem;`
does not solve the problem as the reference is allocated and thus,
it's always true that `&Mem != NULL`. This change uses the
writeMemAtomic/writeMemTiming interfaces instead.
- Per CMO v1.0.1, the instructions in the spec do not generate
address misaligned faults.
- The CMO extension instructions do not use IMM.
---
arch-riscv: Fix generateDisassembly for Store with 1 source reg
Currently, store instructions are assumed to have two source registers.
However, since we are supporting the RISC-V CMO instructions, which
are Store instructions in gem5 but they only have one source register.
This change allows printing disassembly of Store instructions with
one source register.
---
arch-riscv: Make Zicbom/Zicboz extensions optional in FS mode
Currently, we're enable Zicbom/Zicboz by default. Since those
extensions might be buggy as they are not well-tested, making
those entensions optional allows running simulation where
the performance implication of the instructions do not matter.
Effectively, by turning off the extensions, we simply remove
those extensions from the device tree, so the OS would not
use them. It doesn't prohibit the userspace application to
use those instructions, however.
---
arch-riscv: Add all supporting Z extensions to RISC-V isa string
mem-ruby: update RubyRequest print to include GPU fields
The print function used for RubyRequests did not include the GPU
specific fields (for the GLC and SLC bits, which are cache modifiers
that specify what level of the memory hierarchy a request should be
performed at). This causes confusion when the GPU Ruby SLICC code prints
out RubyRequest messages, since important fields are missing.
Thus this commit adds that support. Since these fields are already part
of the RubyRequest class, and are always 0 for non-GPU requests, it
should not affect other components beyond slightly longer prints.
Change-Id: I31c9122b82dfa2c6415ce25d225ea82cb35c7333
Made the following changes to fix the behavior of GLC atomics in a WB
L2:
- Stored atomic write mask in TBE For GLC atomics on an invalid line
that bypass to the directory, but have their atomics performed on the
return path.
- Replaced !presentOrAvail() check for bypassing atomics to directory
(which will then be performed on return path), with check for invalid
line state.
- Replaced wdb_writeDirtyBytes action used when performing atomics with
owm_orWriteMask action that doesn't write from invalid atomic request
data block
- Fixed atomic return path actions
Change-Id: I6a406c313d2f9c88cd75bfe39187ef94ce84098f
This change updates the gem5 SST bridge to call m5.instantiate() in the
gem5 config script instead of in the SST component. This allows more
flexibility for the gem5-SST setup, as we can now write traffic
generators using the bridge.
Previously the GPU L1 I$ (SQC) was not updating the MRU information on
hits in the SQC. This commit resolves that by adding support to the
appropriate Ruby transition.
Previously, opening a config file (such as
`configs/example/hmc_hello.py`) containing non-ASCII characters causes
UnicodeDecodeError.
Also switch to use more an more idiomatic context manager for handling
files.
Change-Id: Ia39cbe2c420e9c94f3a84af459b7e5f4d9718d14
An integer division in the compression:Base:getSize() was being done,
which led to rounding down instead of up.
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
The print function used for RubyRequests did not include the GPU
specific fields (for the GLC and SLC bits, which are cache modifiers
that specify what level of the memory hierarchy a request should be
performed at). This causes confusion when the GPU Ruby SLICC code
prints out RubyRequest messages, since important fields are missing.
Thus this commit adds that support. Since these fields are already
part of the RubyRequest class, and are always 0 for non-GPU requests,
it should not affect other components beyond slightly longer prints.
Change-Id: I31c9122b82dfa2c6415ce25d225ea82cb35c7333
SST SimpleMem will be deprecated in SST 14. PR 396 updated the
bridge to use StandardMem, which is the new memory interface in
SST. This change removes all references to SimpleMem.
Change-Id: I6e4d645317d95ebb610e3dfc93a30d53b91b6b5d
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
This change updates the gem5 SST bridge to call m5.instantiate()
in the gem5 config script instead of in the SST component. This
allows more flexibility for the gem5-SST setup, as we can now write
traffic generators using the bridge.
Change-Id: I510a8c15f8fb00bdbdd60dafa2d9f5ad011e48f2
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
GPR allocation is using fields in the AMD kernel code structure which
are not backwards compatible and are not populated in more recent
compiler versions. Use the granulated fields instead which is enfored to
be backwards compatible.
Change-Id: I718716226f5dbeb08369d5365d5e85b029027932
This fixes occasional readBlob fatals caused by the functional read of
system memory, seen often with the KVM CPU.
Change-Id: Ifccee666f62faa5b2fcf0a64a9d77c8cf95b3add
The amdgpu driver can, at *any* time, tell the device to unmap a queue
to force the queue descriptor to be written back to main memory in the
form of a memory queue descriptor (MQD). It will then immediately remap
the queue and continue writing the doorbell to the queue. It is possible
that the doorbell write occurs after the queue is unmapped but before it
is remapped. In this situation, we need to check the updated value of
the doorbell for the queue and write that to the queue after it is
mapped.
To handle this, a pending doorbell packet map is created to hold a
packet to replay when the queue is mapped. Because PCI in gem5
implements only the atomic protocol port, we cannot use the original
packet as it must respond in the same Tick. This patch fixes issues with
the doorbell maps not being cleared on unmapping to ensure the doorbell
is not found in writeDoorbell and places in the pending doorbell map.
This includes fixing the doorbell offset value in the doorbell to VMID
map which was is now multiplied by four as it is a dword address.
This was tested using tensorflow 2.0's MNIST example which was seeing
this issue consistently. With this patch it now makes progress and does
issue pending doorbell writes.
Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e
gem5 does not currently implement any vendor-specific HSA packets.
Starting in ROCm 5.5, vendor packets appear to end with a completion
signal. Not sending this completion causes gem5 to hang. Since these
packets are not documented anywhere and need to be reverse engineered we
send the completion signal, if non-zero, and finish the packet as is the
current behavior.
Testing: HIP examples working on most recent ROCm release (5.7.1).
Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c
Currently, we're enable Zicbom/Zicboz by default. Since those
extensions might be buggy as they are not well-tested, making
those entensions optional allows running simulation where
the performance implication of the instructions do not matter.
Effectively, by turning off the extensions, we simply remove
those extensions from the device tree, so the OS would not
use them. It doesn't prohibit the userspace application to
use those instructions, however.
Change-Id: Ib30e98c4c39f741dec5f7d31bd7b832391686840
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, store instructions are assumed to have two source registers.
However, since we are supporting the RISC-V CMO instructions, which
are Store instructions in gem5 but they only have one source register.
This change allows printing disassembly of Store instructions with
one source register.
Change-Id: I4dd7818c9ac8a89d5e10e77db72248942a25e938
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This change introduces a template for store instruction's mem access.
The new template is called CacheBlockBasedStore.
The reasons for not reusing the current Store's mem access template
are as follows,
- The CMO extension instructions operate on cache block size granularity,
while regular load/store instructions operate on data of size 64 bits or
fewer.
- The writeMemAtomicLE/writeMemTimingLE interfaces do not allow passing
nullptr as data. However, CPUs in gem5 rely on (data == NULL) to detect
CACHE_BLOCK_ZERO instructions. Setting `Mem = 0;` to `uint64_t Mem;`
does not solve the problem as the reference is allocated and thus,
it's always true that `&Mem != NULL`. This change uses the
writeMemAtomic/writeMemTiming interfaces instead.
- Per CMO v1.0.1, the instructions in the spec do not generate
address misaligned faults.
- The CMO extension instructions do not use IMM.
Change-Id: I323615639a4ba882fe40a55ed32c7632e0251421
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Symbol type is part of the info provided by an ELF object's symtab.
It indicates whether a symbol is a file symbol, or a function symbol, etc.
Change-Id: I827e79f8439c47ac9e889734aaf354c653aff530
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, we are hardcoding the ISA string in the device tree
generator. The ISA string from the device tree affects which
ISA extensions will be used by the bootloader/kernel.
This function allows generating the ISA string from the gem5's
ISA object rather than using hardcoded values.
This series of changes also correct a couple of hardcoded
RISC-V ISA strings in the standard library, as well as not
enable RVV instructions for the U74 core model.
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This PR is fixing https://github.com/gem5/gem5/issues/449 by applying
the following changes
1) Setting up alloc_on_atomic=False in the stdlib
This is directly related to the error message reported by the Issue #449
2) Disabling far atomics in stdlib with policy type = 0
There is an invalid transaction error, likely caused by the fact the
current implementation
is expecting a 2 level cache hierarchy whereas the stdlib example only
allocates one
level of caches (L1). This needs further investigation
3) Explicitly clearing the atomic log
Even by disabling far atomics, the execution of atomicPartial was
populating
the atomic log queue without ever clearing it. This caused the OOM
killer in Linux
to detect the leak and to kill it when the physical resources of the
machine no longer
sufficed. IMHO the atomic log interface should be revamped as atomic
users should
be allocating the atomic log only if explicitly needed
Currently, the kernel's symbols are shifted by `kernel_paddr_offset`,
which is where the kernel is located in the physcial address space.
However, the symbols are mapped to virtual addresses, which stay the
same even though the physical address space is shifted.
This patch removes the offset for the kernel's symbols virtual
addresses.
Change-Id: I7c35f925777220f56bd8c69bba14c267d2048ade
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
As pointed out by [1], Arm doesn't seem to respect the cacheability
attribute when mapping uncacheable memory. This is because the request
is not tagged as uncacheable during SE translation With this patch we
are checking for the cacheability attribute before finalizing
translation
[1]: https://github.com/gem5/gem5/issues/509
Change-Id: I42df0e119af61763971d5766ae764a540055781b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is a temporary solution to fix daily tests. We could revert
to the default (policy_type = 1) once the problem is properly
fixed
Change-Id: Ia80af9a7d84d5c777ddeb441110a91a1680c1030
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The new far atomics implementation [1] didn't take into consideration
it was supposed to manually clear the atomic log. This caused a
memory leak where the log queue was getting bigger and bigger
as no cleaning was happening
[1]: https://github.com/gem5/gem5/pull/177
Change-Id: I4a74fbf15d21e35caec69c29117e2d98cc86d5ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
gem5 will otherwise fatal with the error message:
fatal: ... alloc_on_atomic without default or user set value
See github issue [1] for further details
[1]: https://github.com/gem5/gem5/issues/449
Change-Id: I0bb8fccf0ac6d60fc6c1229436a35e91b2fb45cd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Major refactoring of the branch predictor unit.
- Clearer control flow of the main branch predictor
- Remove `uncondBranch` and `btbUpdate` functions in favor
of a common `historyUpdate` function. There is now only
one lookup function for conditional branches and the new
`historyUpdate` for speculative history update.
- Added a new *target provider* class.
- More expressive statistics depending on the different branch
types.
- Cleanup the branch history management
Current hardcoded value does not support vector instructions.
The new ISA string generator function allows the flexibility
of using or not using the vector extension.
Change-Id: Ic78c4b6629ad3813fc172f700d77ea956552e613
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, we are hardcoding the ISA string in the device tree
generator. The ISA string from the device tree affects which
ISA extensions will be used by the bootloader/kernel.
This function allows generating the ISA string from the gem5's
ISA object rather than using hardcoded values.
Change-Id: I2f3720fb6da24347f38f26d9a49939484b11d3bb
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Unfortunately Actions uses docker contaienrs to create files on the
system with root permissions. The 'vagrant' user which we login to run
the Actions Runner, can't remove these files. However, 'vagrant' is part
of the sudo group and can therefore use sudo to remove these files.
I don't like this, but it works.
Major refactoring of the branch predictor unit.
- Clearer control flow of the main branch predictor
- Remove `uncondBranch` and `btbUpdate` functions in favour of a
common `historyUpdate` function. There is now only one lookup
function for conditional branches and the new `historyUpdate` for
speculative history update.
- Added a new *target provider* class.
- More expressive statistics depending on the different branch types.
- Cleanup the branch history management
Change-Id: I21fa555b5663e4abad7c836fc1d41a9c8b205263
Signed-off-by: David Schall <david.schall@ed.ac.uk>
Some debug registers were incorrectly tagged
(e.g. as being writeable). This was causing a bug in some gem5-KVM runs
where gem5 was trying to initialize the state of those registers
(OSLSR_EL1) [1] but KVM was returning an error (as the registers were
RO).
[1]: https://github.com/gem5/gem5/blob/stable/\
src/arch/arm/kvm/armv8_cpu.cc#L408
Capstone is an open source disassembler [1] already used by
other projects (like QEMU).
gem5 is already capable of disassembling instructions. Every StaticInst
is supposed to define a generateDisassembly method which returns the
instruction mnemonic (opcode + operand list) as a string.
This "distributed" implementation of a disassembler relies
on the developer to properly populate the metadata fields
of the base instruction class.
The growing complexity of the ISA code and the massive reuse
of base classes beyond their intended use has led to a
disassembling logic which contains several bugs.
By allowing a tracer to rely on a third party disassembler, we fill the
instruction trace with a more trustworthy instruction stream.
This will make any trace parsing tool to work better and it will
also allow us to spot/fix our own bugs by comparing instruction
traces with native vs custom disassembler
[1]: http://www.capstone-engine.org/