The CL updates the Kconfig:
1. Replace the USE_NULL_ISA with BUILD_ISA
2. The USE_XXX_ISAs are depends on BUILD_ISA
3. If the BUILD_ISA is set, at least one of USE_XXX_ISAs must be set
4. Refactor the USE_KVM option
Change-Id: I2a600dea9fb671263b0191c46c5790ebbe91a7b8
These are not yet consumed by anything, but convert all the settings
from SCons variables to Kconfig variables.
If you have existing SConsopts files which need to be converted, you
should take a look at KCONFIG.md to learn about how kconfig is used in
gem5. You should decide if any variables need to be available to C++ or
kconfig itself, and whether those are options which should be detected
automatically, or should be up to the user. Options which should be
measured automatically should still be in SConsopts files, while user
facing options should be added to new or existing Kconfig files.
Generally, make sure you're storing c++/kconfig visible options in
env['CONF'][...]. Also remove references to sticky_vars since persistent
options should now be handled with kconfig, and export_vars since
everything in env['CONF'] is now exported automatically.
Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which
is still a sticky SCons variable. This is necessary because EXTRAS also
controls what config options exist. If it came from Kconfig itself, then
there would be a circular dependency. This dependency could
theoretically be handled by reparsing the Kconfig when EXTRAS
directories were added or removed, but that would be complicated, and
isn't supported by kconfiglib. It wouldn't be worth the significant
effort it would take to add it, just to use Kconfig more purely.
Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9
mtvec.mode is extended in the new riscv proposal, like fast interrupt.
This change moves that part from Fault class to ISA class for
extendable.
Ref: https://github.com/riscv/riscv-fast-interrupt
gpu-compute: Fix typo with GPUTLB print
Print was not properly ending in a newline, which caused confusion when
looking a trace with GPUTLB enabled. This fixes that.
Symbol type is part of the info provided by an ELF object's symtab.
It indicates whether a symbol is a file symbol, or a function symbol,
etc.
This chain of commits introduces a way to only load function symbols
to the gem5's symbol table. The RISC-V BootloaderKernelWorkload now
loads only function symbols from the bootloader and the kernel binaries
by default.
arch-riscv: Fix implementation of CMO extension instructions
This change introduces a template for store instruction's mem access.
The new template is called CacheBlockBasedStore.
The reasons for not reusing the current Store's mem access template
are as follows,
- The CMO extension instructions operate on cache block size
granularity,
while regular load/store instructions operate on data of size 64 bits or
fewer.
- The writeMemAtomicLE/writeMemTimingLE interfaces do not allow passing
nullptr as data. However, CPUs in gem5 rely on (data == NULL) to detect
CACHE_BLOCK_ZERO instructions. Setting `Mem = 0;` to `uint64_t Mem;`
does not solve the problem as the reference is allocated and thus,
it's always true that `&Mem != NULL`. This change uses the
writeMemAtomic/writeMemTiming interfaces instead.
- Per CMO v1.0.1, the instructions in the spec do not generate
address misaligned faults.
- The CMO extension instructions do not use IMM.
---
arch-riscv: Fix generateDisassembly for Store with 1 source reg
Currently, store instructions are assumed to have two source registers.
However, since we are supporting the RISC-V CMO instructions, which
are Store instructions in gem5 but they only have one source register.
This change allows printing disassembly of Store instructions with
one source register.
---
arch-riscv: Make Zicbom/Zicboz extensions optional in FS mode
Currently, we're enable Zicbom/Zicboz by default. Since those
extensions might be buggy as they are not well-tested, making
those entensions optional allows running simulation where
the performance implication of the instructions do not matter.
Effectively, by turning off the extensions, we simply remove
those extensions from the device tree, so the OS would not
use them. It doesn't prohibit the userspace application to
use those instructions, however.
---
arch-riscv: Add all supporting Z extensions to RISC-V isa string
Currently, we're enable Zicbom/Zicboz by default. Since those
extensions might be buggy as they are not well-tested, making
those entensions optional allows running simulation where
the performance implication of the instructions do not matter.
Effectively, by turning off the extensions, we simply remove
those extensions from the device tree, so the OS would not
use them. It doesn't prohibit the userspace application to
use those instructions, however.
Change-Id: Ib30e98c4c39f741dec5f7d31bd7b832391686840
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, store instructions are assumed to have two source registers.
However, since we are supporting the RISC-V CMO instructions, which
are Store instructions in gem5 but they only have one source register.
This change allows printing disassembly of Store instructions with
one source register.
Change-Id: I4dd7818c9ac8a89d5e10e77db72248942a25e938
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This change introduces a template for store instruction's mem access.
The new template is called CacheBlockBasedStore.
The reasons for not reusing the current Store's mem access template
are as follows,
- The CMO extension instructions operate on cache block size granularity,
while regular load/store instructions operate on data of size 64 bits or
fewer.
- The writeMemAtomicLE/writeMemTimingLE interfaces do not allow passing
nullptr as data. However, CPUs in gem5 rely on (data == NULL) to detect
CACHE_BLOCK_ZERO instructions. Setting `Mem = 0;` to `uint64_t Mem;`
does not solve the problem as the reference is allocated and thus,
it's always true that `&Mem != NULL`. This change uses the
writeMemAtomic/writeMemTiming interfaces instead.
- Per CMO v1.0.1, the instructions in the spec do not generate
address misaligned faults.
- The CMO extension instructions do not use IMM.
Change-Id: I323615639a4ba882fe40a55ed32c7632e0251421
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, we are hardcoding the ISA string in the device tree
generator. The ISA string from the device tree affects which
ISA extensions will be used by the bootloader/kernel.
This function allows generating the ISA string from the gem5's
ISA object rather than using hardcoded values.
This series of changes also correct a couple of hardcoded
RISC-V ISA strings in the standard library, as well as not
enable RVV instructions for the U74 core model.
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, the kernel's symbols are shifted by `kernel_paddr_offset`,
which is where the kernel is located in the physcial address space.
However, the symbols are mapped to virtual addresses, which stay the
same even though the physical address space is shifted.
This patch removes the offset for the kernel's symbols virtual
addresses.
Change-Id: I7c35f925777220f56bd8c69bba14c267d2048ade
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
As pointed out by [1], Arm doesn't seem to respect the cacheability
attribute when mapping uncacheable memory. This is because the request
is not tagged as uncacheable during SE translation With this patch we
are checking for the cacheability attribute before finalizing
translation
[1]: https://github.com/gem5/gem5/issues/509
Change-Id: I42df0e119af61763971d5766ae764a540055781b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Currently, we are hardcoding the ISA string in the device tree
generator. The ISA string from the device tree affects which
ISA extensions will be used by the bootloader/kernel.
This function allows generating the ISA string from the gem5's
ISA object rather than using hardcoded values.
Change-Id: I2f3720fb6da24347f38f26d9a49939484b11d3bb
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Some debug registers were incorrectly tagged
(e.g. as being writeable). This was causing a bug in some gem5-KVM runs
where gem5 was trying to initialize the state of those registers
(OSLSR_EL1) [1] but KVM was returning an error (as the registers were
RO).
[1]: https://github.com/gem5/gem5/blob/stable/\
src/arch/arm/kvm/armv8_cpu.cc#L408
We move it to the child class which is what the TarmacTracer
actually uses.
Change-Id: Ia30892723d2e1f7306dae87c6c9c1d69d00ad73d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This patch fixes the size of the memory acceses in vswhole and
vlwhole instructions to the maximum vector length.
Change-Id: Ib86b5356d9f1dfa277cb4b367893e3b08242f93e
This patch adds elen as a member of vector configuration instructions so it can be used with the especulative execution
Change-Id: Iaf79015717a006374c5198aaa36e050edde40cee
In first place, vlen is added as a member of Vector Macro Instructions
where it is needed to split the instruction in Micro Instructions.
Then, new PCState methods are used to get dynamic vlen and vlenb
values at execution.
Finally, vector length data types are fixed to 32 bits so every vlen value
is considered.
Change-Id: I5b8ceb0d291f456a30a4b0ae2f58601231d33a7a
This patch add vlen definition to the riscv decoder so it can be used in Vector Instruction Constructors
Change-Id: I52292bc261c43562b690062b16d2b323675c2fe0
This path redefine VecRegContainer for RISCV so it can hold every VLEN + ELEN possible configuration used at execution time
Change-Id: Ie6abd01a1c4ebe9aae3d93f4e835fcfdc4a82dcd
Misc Regs might contain rather important information about the state of
a core, e.g., information in CSR registers.
This patch enforces copying the CSR registers when switching cpus. The
bug and the proposed fix are reported here [1].
[1] https://github.com/gem5/gem5/issues/451
Change-Id: I611782e6e3bcd5530ddac346342a9e0e44b0f757
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This is similar to [1] and [2].
Essentially, the VS bits of STATUS CSR keep track of the state of
the vector registers. (VS bits == DIRTY) means the content of vector
registers have been updated since the last time the VS bits were
updated.
This chain of changes is supposed to change the VS bits to DIRTY for if
any
vector register is potentially updated.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/pull/370
Change-Id: I0427890dadc63b74a470d7405807dcfcad18005b
After removing the setRegOperand in VecRegOperand
https://github.com/gem5/gem5/pull/341. The vmask_vm_micro will not write
back to register because tmp_d0 is not the reference type. The PR will
make tmp_d0 as reference of regFile.
Change-Id: I2a934ad28045ac63950d4e2ed3eecc4a7d137919
ThumbEE had already been removed but there were still some
references to it dangling around. We were also signaling
ThumbEE as being available through HWCAPS in SE which
was not correct. This patch is fixing it
Change-Id: I8b196f5bd27822cd4dd8b3ab3ad9f12a6f54b047
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Jazelle state has been officially removed in Armv8.
Every AArch32 implementation must still support the
"Trivial Jazelle implementation", which means that while
the instruction set has been removed, it is still possible
for privileged software to access some Jazelle registers
like JIDR,JMCR, and JOSCR which are just treated as RAZ
Change-Id: Ie403c4f004968eb4cb45fa51067178a550726c87
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is a standard compare and swap but implemented on vector memory
buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with
MUBUF's special address calculation).
This was tested using a Tensile kernel, a backend for rocBLAS, which is
used by PyTorch and Tensorflow. Prior to this patch both ML frameworks
crashed. With this patch they both make forward progress.
Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06
This instruction is used by ML frameworks to prioritize certain
wavefronts. Since gem5 does not have any support for wavefront
scheduling based on priority (besides wavefront age), we ignore this
instruction and warn_once rather than calling panic. Since hardware can
override this priority anyways, we can be sure that ignoring the value
will not inhibit forward progress resulting in application hangs.
Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc
This is a standard compare and swap but implemented on vector memory
buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with
MUBUF's special address calculation).
This was tested using a Tensile kernel, a backend for rocBLAS, which is
used by PyTorch and Tensorflow. Prior to this patch both ML frameworks
crashed. With this patch they both make forward progress.
Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06
This adds the [pyupgrade](https://github.com/asottile/pyupgrade) hook to
pre-commit.
This hook automatically upgrades the syntax to the recommended standards
for the newer version of the language.
Memory instructions acquire coalescer tokens in the schedule stage.
Currently this is only done for buffer and flat instructions, but not
flat global or flat scratch. This change now acquires tokens for flat
global and flat scratch instructions. This provides back-pressure to the
CUs and helps to avoid deadlocks in Ruby.
The change also handles returning tokens for buffer, flat global, and
flat scratch instructions. This was previously only being done for
normal flat instructions leading to deadlocks in some applications when
the tokens were exhausted.
To simplify the logic, added a needsToken() method to GPUDynInst which
return if the instruction is buffer or any flat segment.
The waitcnts were also incorrect for flat global and flat scratch. We
should always decrement vmem and exp count for stores and only normal
flat instructions should decrement lgkm. Currently vmem/exp are not
decremented for flat global and flat scratch which can lead to deadlock.
This change set fixes this by always decrementing vmem/exp and lgkm only
for normal flat instructions.
Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5